
DELHI POLYTECH NIC 

LIBRARY 

CLASS NO. 3 U 

BOOK NO. V 7^ 'iUL 

ACCESSION NO. H 


PATE DUE » 

For each day’s delay after the due date 
a fine of 3 P per Vol shall be charged for 
the first week, and 25 P per Vol. per day 
for subsequent days. 

Borrower’s Date Borrower’s Date 

No. Due No. Due 





AN INTRODUCTION TO THE 


THEORY OP STATISTICS 


G. UDN^ YULE ,/C.RE., M.A., F.R.S., 

FELLOW OF Sir JOHN^VoLLim, AND FORMERLY READER IN 
HI AH STICK, (AMBRIDOE; HONORARY VICE-PRESIDENT 
OF THE ROYAL STATISTICAL SOCIETY 

(and ) 

M. G.( KENDALL,! M.A., 

FOBMKRLY MATHEMATICAL Si IIOLAR OF ST JOHNK COLLI GE, CAMBRIDGE; 
FtLLOW OF JTIE ROYAL STATlSTlo/b «Ol lllTY. 


imutb 55 Diagrams an& 4 ffoltrtng JMatcs. 



TWELFTH EHIU lOt >, It E VI EE D, 


Government of India. 

DELHI POLYTh.C!IN T ' 

UBrf ARY 

LONDON: 

CHARLES GRTFF1N A COMPANY, LIMITED, 
42 DllUItY LANE, W.U.2. 

1 9 40, 


[AU Rights Reserved.) 



Punted m Gioat P.ut.un by 
Neill & Co . Lin , Lwnburcii. 



ABRIDGED PREFACE TO THE FIRST 
EDITION. 


The following chapters arc based on the courses of instruction given 
during my tenure of the New march Lectureship in Statistics at University 
College, London, in the sessions 1902-1909. The variety of illustrations 
and examples has, however, been increased to render the book more 
suitable for the use of biologists and others besides those interested in 
economic and vital statistics, and some of the more difficult parts of the 
subject have been treated m greater detail than was possible in a sessional 
course of some thirty lectures. To (‘liable the student to proceed further 
with the subject, fairly detailed lists of references to the original memoirs 
ha\e been given: exercises have also been added for the bendit, more 
especially, of the student who is working without the assistance of a 
tr acht r. 

The volume represents an attempt to work out a systematic intro¬ 
ductory course on statistical methods -the methods available for dis¬ 
cussing, as distinct from collecting, statistical data—suited to those who 
possess only a limited knowledge of matficmitics; an acquaintance with 
algebra up to the binomirfl^ theorem, together with such elements of 
co-ordinate geometry as are noi/*gyneihlly included therewith, is all that 
is assumed. I hope that it may pi x>l(f of some service to the students of 
the du erse sciences in which statistical methods are now employed. 

G. U. Y. 

Du ember 1910. 


v 




PREFACE TO THE ELEVENTH EDITION. 


The “ Introduction to the Theory of Statistics ” having completed five-and- 
twenty years of life, it was decided that the time had come when a complete 
revision should be made. This, I felt, I could not personally undertake: 
it was clearly a task for a younger man, more in touch with recent literature 
and less affected by the prejudices of age in favour of the old and the 
familiar. 

Mr Kendall undertook the task not merely with willingness but with 
enthusiasm. I read his typescript, but to him is primarily and almost 
solely due the credit for suggesting the general lines of the revision, and 
for carrying out the agreed suggestions: the only new chapter for which 
I am directly responsible is Chapter 24 on Interpolation and Graduation, 
based on a few lectures sometimes included in former courses. 

I hope that in its new form the book may long continue to be of service 
to further generations of students. 

G. Udny Yule. 

Cambridge, 

July 1937. 


In the revision undertaken for this edition, apart from some substitution 
of new numerical illustrations for old, very little of the material appearing 
in earlier editions has been deleted. A few minor alterations have been 
made—the matter formerly included in supplements has been incorporated 
in the text, and there has been some rearrangement—but the major 
changes are almost entirely in the form of additions. Of these, the most 
important are several new chapters on Sampling, including an intro¬ 
ductory chapter on Small Samples. Chapters have also been added on 
Moments and Measures of Skewness and Kurtosis, and on Simple Curve 
Fitting by the Method of Least Squares. Mr Yule has contributed a new 
chapter on Interpolation and Graduation. For the first time Tables of 
the various functibns commonly required in statistical work have been 
assembled at the end of the book. Throughout the preparation of this 
new material I have had the benefit of Mr Yule’s encouragement, criticism 
and advice. 

The complete revision has presented the opportunity of issuing the 
book in new form, and it is hoped that the larger page and type will be 



viii 


THEORY OF STATISTICS. 


found an improvement, A more distinctive system of paragraph number¬ 
ing and paragraph headings has been introduced. Some further Exercises 
have also been added. 

Notwithstanding the mathematical character of recent developments 
in statistical theory, an attempt has been made to keep within the limits 
laid down by Mr Yule for earlier editions of this book in regard to the 
knowledge of mathematics required by its readers. In one or two places 
it has been necessary to introduce the notation of the integral calculus, 
but this has been accompanied by explanations in terms of geometrical 
ideas. 

It is a pleasure to record Mr Yule’s and my indebtedness to “ Student ” 
and the proprietors of Mcfron for permission to reproduce a slightly 
condensed version of the former’s tables of the ^-integral; and to R. A. 
Fisher and Messrs Oliver & Boyd for permission to reproduce the tables 
of the significance points of the 2 -integral from Professor Fisher’s 
“ Statist ical Methods for Research Workers” The tables for the 0*1 per cent, 
level of z are due to W. E. Doming, Lola S. Doming and C, G. Oolcord, 
who have also very generously given their consent to the reproduction. 

I shall feel indebted to any reader who directs my attention to possible 
errors, omissions, ambiguities or obscurities. 

M. G. K. 


London, 

July 1037 


PREFACE TO THE TWELFTH EDITION. 


It is very gratifying to be able to record that the eleventh edition of this 
book, though large, has been rapidly exhausted. I am encouraged to 
hope that the revised form of the book continues to meet the needs of 
the growing class of students of statistical theory. 

No amendments of any substance have been made in this edition. 
A few misprints have been corrected, in one or two places a paragraph 
has been modified to bring it up to dale, and some references have been 
added on page 529 to books and tables which have appeared since 1937; 
but apart from this, the present edition is practically the same as its 
predecessor. 

I should like to take this opportunity of thanking the publishers, 
Messrs Charles Griffin & Co., for the kindness and courtesy which they 
have shown during the preparation of this and the previous edition. 

M. G. K. 

London, 

January 3040. 



CONTENTS. 


PAGES 

Nojrs on Notation and on Tariis ion Vac ii hating Statistic At 

Work . xi xm 

PAGlS 

Iniroduc iton . . 1 

niAl 

1 Tin or\ or XriRiBuxts—N otation and Tirmtnoioga 11 

2 Consist* mi oi Data . 25 

3. Assoc ia i ion or Aiimnuirs 34 

4 Paiiiiai \ssocimion , 50 

5 M anti ot d ( r Assil JC ATION 05 

0 Fra gi t \c \ -Distributions H2 

7 Avi ragi s and Oimit Mpasurts oi Location 112 

8 Mi ast ui s o* Dispi rsion 134 

9 MomJ NTS AND Mj ASl IDS OI SifLVA N1 SS AND Ivi 1UOSIS 151 

10 r l hid i Impoiiiant Imoiu hcvi Disthtri tions — nn Binomiai, 

jin Normai and itil Poisson 169 

11 CORRIIATION 196 

32 ISORMAI COKRTIATTON 227 

13 1m HI ITT It r i HI ORA OI COURIIAIION 211 

11 Pmuiaj C ORID L ATION 2b 1 

15 Court iaiion lx i itsir axions and Praciic at Mil hods 288 

16 Mist I J I AM OU8 'J III OR1 MS IN\OI A1NG IJD l SL OI THI ( ORRLT ATION 

COJlllCIlNl 297 

17 Simpii Ctrai 1 itting . 309 

18 Pm 11 min a icy No i ions on Sampting 332 

19 Tin Sami iim. oi Amain tis Largj Saaipiis 350 

20 r l xii Sampiing oi \ ari abuts— Largi Saaipiis . . 373 

23 Tin S am 11 tnc» oi \ariabiis Larci Sampi i &, eon t iiiiicd . 391 

-22 Tm / a Disirtrltion . 113 

23 Tin Sampiing oi \ariariis -Saiaii Saaipiis . 1«>4 

24. iNTT RPORA1ION AND GllADCAlION . . . 402 

Ri t i ri nct s 195 

Si pin lmlnt ary R j i i nr nci s 529 

APPLNDTX TABLKS, BTC. 

TAllI T 

1. Ordtnatts oi ihl Normai Cuiivr I or Giain Vaitis oi thl 

Diataii . 531 

2 Am as oi mu Normai Oi rai tying to mi In noi titd Ordinatj s 

ai Given Dlmaiis . .532 



X 


THEORY OF STATISTICS 


TABLE PA Q® 

8. Probability that a Normal Deviate is Greater in Absolute 

Value than a Given Value ...... 538 

4. Values of the Integral for One Degree of Freedom— 

A, for Values of x 2 from 0 to 1 . . . . 534 

B, for Values of x 2 from 1 to 10 . . . . . 585 

5. Areas of tiie ^-Curves lying to the Left of the Ordinates at 

Given Deviates ........ 536-7 

6. Significance Points of the z Integral— 

A, for the 5 per cent. Level ..... 538 

B, for the 1 per cent. Level ..... 539 

C, FOR THE 0*1 PER CENT. LEVEL . . . . .510 

Diagram giving the Contour Lines of the Surface P~F(v, x a ) 

Facing 540 


Answers to Exercises ........ 541 

Index ........... 553 


LIST OF FOLDING PLATES. 

Fig. 11.2. Frequency-Surface for the Hate of Discount 
and Ratio of Reserves to Deposits in American 
Banks . . . . . . . Facing p . 201 

Fig. 11.8. Frequency-Surface for Stature of Father and 

Stature of Son . . . . . . ,,201 

Table 11.9. Correlation between Length of Mother-frond 
and Length of Daugiiter-frond in Lemna 
minor . . . . . . . . 218 

Fig. Al. Contour Lines of the Surface P-~F(v, X 2 ) . „ 510 





NOTES ON NOTATION AND ON TABLES FOR 
FACILITATING STATISTICAL WORK. 


A. Notation. 

The reader is assumed to be familiar with the commoner mathematical 
signs, c.g. those for addition and multiplication. We shall also employ 
the following symbols, all of which are in general use :— 

The Factorial Sign. 

The symbol n !, read u factorial n,” means the number 
1 x 2 x IJ x . . . x (n ~ 2) x (n -1) x n 

Factorial n is by some writers expressed by the symbol [n, but this 
notation appears to be falling out of use in favour of n !, probably owing 
to the greater ease with which the latter form can be printed and type¬ 
written. 

The Combinatorial Sign. 

The symbol n C r means the number of ways in which r things can be 
chosen from n things, c.g. 62 C 13 is the number of ways in which a hand 
of cards can be dealt from an ordinary pack of 52 cards. 

In most text-books on Algebra it is shown that 

«r - wi _rip 

r ~r !(n-rjl” (n ^ , 

The Summation Sign. 

r—n 

The sum of n numbers oc v i r 2 , . . . x n is written S (x r ), read “ sum x r 

r= 1 

from one to h,” i.c. 

S (,r r )=.r 1 +.r 2 +a: 3 + . . . + + ir n 

r-~ 1 

Where no ambiguity is likely to arise, the suffix r and the limits 
written above and below S are omitted, e.g. the above sum would be 
written simply S(#), it being understood from the context that the 
summation extends over the n values. 

Many writers use the Greek letter S instead of S. 

The Greek Alphabet. 

As the letters of the Greek alphabet will often be used as symbols, we 
give for convenience the names of those letters. 

xi 



XI1 THEORY OF STATISTICS. 


Small 

better. 

Capital 

Letter. 

Name. 

Small 

Letter. 

Capital 

Letter. 

Name. 

a 

A 

alpha 

V 

N 

nil 

P 

B 

beta 

i 

E 

xi 

y 

F 

gamma 

o 

o 

omioron 

8 

A 

delta 

77 

If 

pi 

€ 

K 

epsilon 

P 

p 

rho 

£ 

Z 

zet a 

o\ <r 

V 

sigma 

V 

11 

eta 

T 

T 

tan 

6 

0 

theta 

V 

Y 

upsilon 

L 

I 

lot a 

4> 


phi 

K 

K 

kappa 

X 

X 

chi (pron. ki) 

A 

A 

lambda 


vp 

psi 

T 

M 

mu 

ft) 

12 

omega 


B. Calculating Tables. 

For heavy arithmetical work a calculating machine is invaluable ; 
but owing to their cost machines arc, as a rule, beyond the roach of the 
student. 

For a great deal of simple work, especially work not intended for 
publication, the student will find a slide rule exceedingly useful : par¬ 
ticulars and prices will be found in any instrument-maker’s catalogue. 
For greater exactness in multiplying or dividing, logarithms are almost 
essential. 

If it is desired to avoid logarithms, use may be made of extended 
multiplication tables. There are a great many of these and some 
references to different forms are given in the list on pages 52 i 525. 

In addition to general arithmetical tables of this kind, the student 
will derive invaluable aid from Barlow’s “ Tables of Squares, Calx's, Squat c- 
roots, Cube-roots, and Reciprocals of All Integral Numbers up to 10,000 ” 
(E. & F. N. Spoil, London and New York, price 7s. Gd.), which are useful 
over a wide range of statistical work. 

C. Special Tables of Functions Useful in Statistical Work. 

The tables and diagram at the end of this book will cover most of 
the student ’s ordinary requirements. Other tables appear in I hi 1 works 
cited on page 525. The more advanced student will find it useful to have 
“ Tables for Statisticians and Bio metricians 7 " (Cambridge University Press, 
Part 1, price 15s., and Part 2, price 30s.) — particularly Part L. Research 
workers will wish to have Fisher and Yates’ “Statistical Tables jor Bio¬ 
logical, Agt [cultural and Medical Research ’’ (Oliver & Boyd, price Pis. Gd.). 

D. References to the Text. 

Each section in the book is distinguished by a number in heavy type 
consisting of the number of the chapter in which the section occurs 
prefixed to the number of the section in that chapter and separated from 
it by a period ; e.g. 7.13 means the Thirteenth Section of Chapter 7, 
and 10.1 refers to the First Section of Chapter 10. The Introduction, 



NOTES ON NOTATION AND ON TABLES. xiil 

which precedes Chapter 1, is for this purpose regarded as Chapter 0, e.g. 
0.26 refers to the Twenty-sixth Section of the Introduction. References 
to sections are given simply by the number of the sections, e.g . “ We saw 
in 8.3 55 means “ Wc saw in the Third Section of Chapter 8.” 

Similarly, equations, tables, examples, exercises and diagrams are 
distinguished first of all with the number of the chapter in which they 
occur and then, separated by a period, with their serial number within 
the chapter, e.g. “ Tabic 6.7 ” refers to the Seventh Table in Chapter 0, 
and “Equation (17.8) 55 refers to the Eighth Equation of Chapter 17. 
These figures are in ordinary type. 

This simple notation saves a good deal of unnecessary wording. To 
facilitate quickness of reference we sometimes give pages as well. 

A distinction is drawn between examples, which are given in the text 
for purposes of illustration, and exer rises. which are set at the end of the 
chapter for the student to work out for himself. 




THEORY OP STATISTICS. 




INTRODUCTION. 

Number and Measurement. 

0.1. Western civilisation is pervaded by ideas of number and measure¬ 
ment. Even the events of our everyday life are inextricably bound up 
with them. We have only to picture a race which cannot count or measure 
trying to run the Bank of England, or control the milk market, or even 
understand the sporting columns of the daily press, to realise how deeply 
rooted numbers are in the complex activities of the modern world. 

0.2. Science itself is particularly indebted to numerical expression. 
As organised knowledge has increased, the necessity for precision has 
become greater, and in the foimulation of precise statements number and 
measurement have played a leading part. The desire for quantitative 
expression was first felt in the physical sciences, but it has now spread into 
nearly all branches of knowledge. The movement is by no means com¬ 
plete, however, and may be seen at work to-day. As a significant instance 
we may note that courageous attempts are being made to subject the 
process of thought itself—that last stronghold of the contentious and the 
mysterious—to quantitative inquiry. 

0.3. Many people, m fact, have been led by their enthusiasm for 
numerical data to regard knowledge of a non-quantitative kind as hardly 
deserving the name “ knowledge ” at all. Towards the close of the nine¬ 
teenth century it was possible for Lord Kelvin to say: “ When you can 
measure what you are speaking about and express it in numbers you know 
something about it; but when you cannot measure it, when you cannot 
express it in numbers, your knowledge is of a meagre and unsatisfactory 
kind.” This remark has often been quoted with an approval which it does 
not altogether deserve—it does not, for example, do justice to the work of 
Darwin and Pasteur, to name only two of Kelvin’s contemporaries. But 
there can be no denying that it expresses a point of view which many 
people will endorse. 

Numerical Data. 

0.4. The desire for precision, in fact, leads investigators of all kinds, 
from the atomic physicist to the business man, to express the facts about 
that part of the universe which interests them in a quantitative way. 
Numerical data have come into being not only in the laboratory and the 
study, but in the counting-house, the sales department, the Board Room 
and the legislative assembly. It is difficult to see how our society could be 
organised without them. Where the Jews and the Romans were content 



2 THEORY OF STATISTICS* 

with occasional censuses for military or fiscal purposes* 1 the progressive 
modem state finds itself under the necessity of keeping a close and quanta 
tfttive eye on all that goes on within or without its frontier. A country 
which does not do so may he fairly regarded as backward, In a typical 
phrase Anatole France summed up this point of view when he said of the 
Chinese : “ Tarit qu’ils ne se seront pas comptes, ils ne compteront pas ”— 
if they don’t count they won’t count. 

Statistics Concerned with Numerical Data, 

0.5* There are certain features of numerical data, no matter in what 
branch of knowledge they originate, which may call for a special type of 
scientific method to treat them and elucidate them. This is known as 
Statistical Method, or more briefly, as Statistics. It does not, however, 
embrace the study of numerical data of every kind, and before we attempt 
a formal definition of its nature and scope, it is necessary to give some 
woods of explanation. 

Effects and Causes. 

* 0.6. One of the principal aims of Science is to trace, amidst the tangled 

if complex of the external world, the operation of what are called “ laws ”— 
to interpret a multiplicity of natural phenomena in terms of a few funda¬ 
mental principles. A knowledge of the operation of these laws enables us 
to talk of “ cause ” and “ effect.” The metaphysical problems associated 
with these words need not detain us, but since in the sequel we shall often 
use them, it is proper to explain that wc adopt them as a convenient way 
of expressing serviceable and familiar ideas. We need not worry if the 
atomic physicist says that causation must be rejected. We shall be dealing 
with the everyday world, where “ law ” and “ cause ” have significant and 
important connotations. 

0.7, With this convention, we may say that any physical event, and 
in particular that described by quantitative data, is produced by the 
operation of one or more causes. The number of causes which produce any 
particular effect may be, and usually is, extremely large. For instance, 
the height of a man is causally linked with his race, his ancestry, his 
habitation, his diet during youth, his age, his occupation, and at any given 
moment even with his position and the time of day. 

0.8. Experiment, the great weapon of scientific inquiry, derives its 
power from the ability of the experimenter to replace such complex 
systems of causation by simple systems in which only one causal circum¬ 
stance is allowed to vary at a time. This is perhaps an ideal, but it is 
one which is closely approached with the technique of modern laboratory 
practice. 

0.9. Let us, however, turn to social science, as the parent of the 
methods termed “ statistical,” for a moment, and consider its character¬ 
istics as compared, say, with physics or chemistry. One characteristic 
stands out so markedly that attention has been repeatedly directed to it 

1 David (II Samuel, 24) numbered the people of Israel and called down a plague by 
doing so. He counted 800,000 valiant men who drew the sword, and though the text 
is not entirely dear it seems likely that Divine disapproval was directed against the 
militaristic purpose of the census, not the census itself. We are told later that 70,000 
men died of the resulting pestilence, so it looks as if there was no ban on counting dead 
men. 



M „ INTKODUCTION. $ 

by H statistical n writers as the source of the peculiar difficulties of their 
science —the observer of social facts cannot experiment, but must deal xmth 
circumstances as they occur, apart from Ms control. The simplification open 
to the experimenter being impossible, the observer has, in general, to deal 
with highly complicated eases of multiple causation—cases in which a given 
result may be due to any one of a number of alternative causes or to a 
number pf different causes acting conjointly. 

0,10. A little consideration will show that this is also characteristic 
of observations in other fields. The meteorologist, for example, is in 
almost precisely the same position as the student of social science. He can 
experiment on minor points, but the records of the barometer, thermo¬ 
meter and rain gauge have to be treated as they stand. With the biologist, 
matters are somewhat better. lie can and does apply experimental 
methods to a very large extent, but frequently cannot approximate closely 
to the experimental ideal; the internal circumstances of animals and plants 
too easily evade complete control. Hence a large field (notably the study 
of variation and heredity) is left in which methods of experiment have to be 
supplemented by other methods. The physicist and chemist, finally, stand 
at the other extremity of the scale. Theirs are the sciences in which 
experiment has been brought to its greatest perfection. But even so, there 
is still scope for the application of statistical treatment in these sciences. 
The methods available for eliminating the effect of disturbing circumstances, 
though continually improved, are not, and cannot be, absolutely perfect. 
The observer himself, as well as the observing instrument, is a source of 
error; the effects of changes of temperature, or of moisture, or pressure, 
and draughts, vibration, etc., cannot be completely eliminated. 

0.11. It is with data affected by numerous causes that Statistics is 
mainly concerned. Experiment seeks to disentangle a complex of causes 
by removing all but one of them, or rather by concentrating on the study 
of one and reducing the others as far as circumstances permit to a com¬ 
paratively small residuum. Statistics, denied this resource, must accept 
for analysis data subject to the influence of a host of causes, and try to 
discover from the data themselves which causes are the important ones and 
how much of the observed effect is due to the operation of each. 

Definitions. 

0.12. In the light of the foregoing discussion we may accordingly give 
the following definitions :— 

By Statistics we mean quantitative data affected to a marked extent 
by & multiplicity of causes. 

By Statistical Methods we mean methods specially adapted to the 
elucidation of quantitative data affected by a multiplicity of causes. 

By Theory of Statistics or, more briefly, Statistics we mean the 
exposition of statistical methods. 

(It will be observed that the same word may be used both for the science 
and for the raw material on which it works. This dual use gives rise to no 
confusion in practice, but the distinction is worth bearing in mind.) 

Use of “ Statistic/’ 

0.13. This is perhaps the appropriate place to remark that there has 
recently come into use the singular form “ statistic/’ This is the name 



4 THEORY OF STATISTICS. 

given to a particular kind of estimate compiled from observations, usually 
according to some algebraical formula, In this book wc shall rarely, 
if ever, have occasion to use the term, and we mention it mainly to 
forewarn the student who may meet the term elsewhere or in further 
readings We may also point out that Statistics is not confined to the 
study of such entities any more than Physics is the study of individual 
articles of physic. 

History of the word “ Statistics.” 

0.14. In their present meaning the words 44 statistics,” u statistician ” 
and “ statistical ” are less than a century old. They have, however, been 
in use longer than that, and it is instructive to consider the process by 
which they have reached their present meaning. 

0.15. The words 44 statist,” 44 statistics,” 44 statistical” appear to be 
all derived, more or less indirectly, from the Latin status , in the sense, 
acquired in mediaeval Latin, of a political State. 

0.16. The first term is, however, of much earlier date than the two 
others. The word 44 statist ” is found, for instance, in Hamlet (1G02)\ 
Cymbeline (1610 or 1011), 2 and in Paradise Regained (1671 ). 3 The earliest 
occurrence of the word 44 statistics ” yet noted is in 44 The Elements of Uni¬ 
versal Erudition ” by Baron J. F. von Bielfeld, translated by W. Hooper, 
M.D. (8 vols., London, 1770). One of its chapters is entitled Statistics , and 
contains a definition of the subject as 44 The science that teaches us what is 
the political arrangement of all the modern states of the known world.” 4 
“ Statistics ” occurs again with a rather wider definition in the preface to 
44 APolitical Survey of the Present State of Europe,” by E. A. W. Zimmermann, 6 
issued in 1787. 44 It is about forty years ago,” says Zimmermann, 44 that 

that branch of political knowledge, which has for its object the actual and 
relative power of the several modern states, the power arising from their 
natural advantages, the industry and civilisation of their inhabitants, and 
the wisdom of their governments, has been formed, chiefly by German 
■writers, into a separate science. . . . By the more convenient form it has 
now received . . . this science, distinguished by the new-coined name of 
statistics , is become a favourite study in Germany ” (p. ii); and the 
adjective is also given (p. v): 44 To the several articles contained in this 
work, some respectable statistical writers have added a view of the principal 
epochas of the history of each country.” 

0.17. Within the next few years the words were adopted by several 
w'riters, notably by Sir John Sinclair, the editor and organiser of the first 
“ Statistical Account of Scotland,” 6 to whom, indeed, their introduction has 
been frequently ascribed. In the circular letter to the Clergy of the Church 
of Scotland issued in May 1790, 7 he states that in Germany 44 4 Statistical 
Inquiries,’ as they are called, have been carried to a very great extent,” 
and adds an explanatory footnote to the phrase 44 Statistical Inquiries ”— 

1 Act 5, sc. 2. 8 Act 2, sc. 4. 3 Bk. 4. 

4 We cite from Dr W. F. Willcox, Quarterly Publications of the American Statistical 
Association , vol. 14, 1914, p. 287. 

5 Zimmermann’s work appears to have been written in English, though he was a 
German, Professor of Natural Philosophy at Brunswick. 

• Twenty-one vols., 1791 -99. 

7 “ Statistical Account vol. 20, Appendix to 41 The History of the Origin and 
Progress ...” given at the end of the volume. 



INTRODUCTION. g 

44 or inquiries respecting the population, the political circumstances, the pro¬ 
ductions of a country, and other matters of state/* In the u History of the 
Origin and Progress ” 1 of the work, he tells us, 44 Many people were at first 
surprised at my using the new words, Statistics and Statistical , as it was 
supposed that some term in our own language might have expressed the 
same meaning. But in the course of a very extensive tour, through the 
northern parts of Europe, which I happened to take in 1786,1 found that in 
Germany they were engaged in a species of political inquiry, to which they 
had given the name of Statistics ; 2 ... as 1 thought that a new word might 
attract more public attention, I resolved on adopting it, and I hope that it 
is now completely naturalised and incorporated with our language.” This 
hope was certainly justified, but the meaning of the word underwent rapid 
development during the half-century or so following its introduction. 

0.18. u Statistics ” (statistik), as the term was used by German 
writers of the eighteenth century, by Zimmermaim and by Sir John 
Sinclair, meant simply the exposition of the noteworthy characteristics 
of a state, the mode of exposition being almost inevitably at that time 
—preponderantly verbal. The conciseness and definite character of 
numerical data were recognised at a comparatively early period—more 
particularly by English writers—but trustworthy figures were scarce. 
After the commencement of the nineteenth century, however, the growth 
of official data was continuous, and numerical statements, accordingly, 
began more and more to displace the verbal descriptions of earlier days. 
“ Statistics ” thus insensibly acquired a narrower signification, viz. the 
exposition of the characteristics of a State by nunmical methods. It is 
difficult to say at what epoch the word came definitely to bear this 
quantitative meaning, but the transition appears to have been only half 
accomplished even after the foundation of the Royal Statistical Society 
in 1834. The articles in the first volume of the Journal , issued in 1888 459, 
are for the most part of a numerical character, but the official definition 
has no reference to method. u Statistics,’* we read, 44 may be said, in the 
words of the prospectus of this Society, to be the ascertaining and bringing 
together of those facts which are calculated to illustrate the condition 
and prospects of society.” It is, however, admitted that 64 the statist 
commonly prefers to employ figures and tabular exhibitions.” 

0.19. Once the first change of meaning was accomplished, further 
changes followed. From the name of a science, the word was transferred 
to those series of figures on which it operated, so that one spoke of vital 
statistics, poor-law statistics, and so on. It was then applied to the 
similar numerical data which occurred in other sciences, such as anthro¬ 
pology and meteorology. By the end of the nineteenth century we find 
44 statistics of mental characteristics in man,” 44 statistics of children 
under the headings bright-average dull,” and even 44 an examination of 
the characteristics of the Virgilian hexameter with statistics.” The 
development of the meaning of the adjective 44 statistical ” and the noun 
44 statistician ” was naturally similar. 

1 Ijoc, cit p. xiii. 

8 The “ Abriss der Staatszvissenschaft der Europdischen Iteiche ” (1749) of Gottfried 
Achenwall, Professor of Politics at Gbttingen, is the volume in which the word 
“statistik” appears to be first employed, but the adjective ”statisticus” occurs at a 
somewhat earlier date in works written in Latin. 



6 THEORY OE STATISTICS. 

0.20. Perhaps the most abstract use of the word occurs in the theory 
of thermodynamics, wherein one speaks of entropy as proportional to the 
logarithm of the statistical probability of the universe —a definition which 
no statesman would be unwilling to admit to lie completely outside his 
purview. But it is unnecessary to multiply instances to show that the 
word “ statistics ” is now entirely divorced from “ matters of State.” 

The Theory of Statistics. 

0.21. The theory of statistics as a distinct branch of scientific method 
is of comparatively recent, growth. Its roots may be traced in the work 
of Laplace and Gauss on the theory of errors of observation, but the 
study itself did not begin to flourish until the last quarter of the nineteenth 
century. Under the influence of Galton and Karl Pearson remarkable 
progress was made, and the foundations of the subject were laid in the 
next thirty years—as it has turned out, very securely. The subject has 
not, however, yet reached a stage whereat a cut-and-dried exposition of 
its methods can be given. Research, particularly into the mathematical 
theory of statistics, is rapidly proceeding, and fresh discoveries are being 
made with a rapidity which makes it difficult to keep pace with them. 
It may, however, help the student to appreciate the work of later chapters 
if we sketch in brief general terms the field of statistical theory as it now 
exists. 

The Collection of Data. 

0.22. The first question which the statistician has to consider is the 
collection and assembling of his data. In many fields, such as economies 
and sociology, he cannot prepare the data himself but has to get what 
he can from such sources as official statistics, which are usually prepared 
with an object differing from his own. Such information is therefore 
rarely all that one could wish. Investigator A, studying the sugar 
market, finds that the official figures run cane and beet sugar together. 
Investigator 13, wanting to compare prices over a period of years, finds 
that during the war period 1914-18 there is a gap in the information. 
Investigator C, wishing to study poverty, has to content himself with 
indirect figures such as those of poor-law relief and unemployment. But 
however incomplete the data may be, and however tangentially pertinent 
to his inquiry, the investigator must take what h., can get and be thankful. 

0,23. In other cases, and particularly in meteorology, biology and 
psychology, he can produce his own data or borrow those of other investi¬ 
gators similarly engaged. He does not merely take his figures from some 
source or other ; he is instrumental in their production, and within limits 
can control their nature so as to bring them to bear directly on his inquiry. 

It might be thought that the only qualities required for such work are 
an ability to count or measure and a reasonable care. But this is not so. 
Once outside the laboratory the investigator is beset with a swarm of 
practical difficulties. We might illustrate the point by referring to the 
troubles of an investigator who wished to find out how many dairy cows 
there were in a certain parish. He took the simplest course and went to 
all the farms in the parish and asked the occupier how many cows he had. 
Farmer A said that he had fifteen, but had sold eight and w r as waiting 
for the buyer to come and fetch them. Farmer B had “ about twenty.” 



INTRODUCTION. 


7 

Farmer C obviously could not be bothered and said the first figure which 
came into his head ; and so on. It is clear that the result of such an 
inquiry would be to give a quite illusory figure* 

0.24. A full discussion of such matters lies outside the scope of this 
book, but we have given them more than a passing mention in order to 
introduce one very necessary caution. 

The reliability of data must always be examined before any attempt 
is made to base conclusions on them. This is true of all data* but 
particularly so of numerical data, which do not carry their quality written 
large upon them. It is a waste of time to apply the refined theoretical 
methods of statistics to data which are suspect from the beginning. 

The Treatment of Data. 

0.25. Having obtained his data and satisfied himself that they are 
reliable enough to permit him to proceed, the statistician must then “ lick 
them into shape.” He must decide on some form of arrangement and 
presentation, reduce them to a convenient scale of units, and so on; in 
short, he must work on his raw material until it is ready for the application 
of his prepared tools. 

0.26. The only process of treatment to which attention need be called 
is that of condensation. The mind is incapable of grasping the significance 
of a large mass of figures. If, therefore, the quantity of data available 
is of any size, some process of condensation is necessary to enable the 
mind to appreciate the picture which the data represent. 

Suppose, for instance, we are discussing the stature of a thousand men, 
and have as data the height of each man to the nearest inch. Our raw 
material then consists of a thousand sets of figures ranging from four feet 
to seven feet, or thereabouts. Only the supermind could look over these 
figures and grasp their essentials. Nor would the position be met by 
rearranging the figures in order of magnitude. To get a clear picture of 
the situation some condensation is necessary, and in this case it can be 
carried out easily by grouping together all the men whose heights lie in a 
certain range, say of three inches. Our total range of three feet is then 
replaced by twelve sub-ranges, each of three inches, and we may summarise 
the data by giving the numbers of men who fall into the twelve sub-ranges. 
In short, we have replaced our original thousand figures by twelve. 

0.27. It will be clear that in so doing we have sacrificed a certain 
amount of information. Twelve figures cannot possibly tell us as much as 
a thousand. It may very well be, however, that the information in the^ 
twelve is all that we require ; the lost information may be irrelevant to ' 
the inquiry. Such a ease would happen if we wanted to know, to an inch 
or so, what was the height exhibited by the greatest number of men. 

0.28. The process of condensation thus sacrifices information but 
gives us instead a very necessary clarity and adaptability for manipulation. 
How far the process is carried in any particular case will depend on how far 
the disadvantages of the sacrifice are offset by the advantages of the clarity. 

Summarising and Descriptive Statistics. 

0.29. The process of summarising which wc have just described may 
be carried a great deal further, and leads to a branch of theory which has 
very important practical applications. 



8 THEORY OF STATISTICS# 

The reader in probably familiar already with the idea of an “ average 
value,** and with its use in compressing into a single number the results of 
a series of observations. Such quantities arc, in fact, the result of sum¬ 
marising to the greatest possible extent; they are summaries in which the 
statistician has distilled the information of a diffuse mass of figures into a 
single drop, so to speak. 

0.30# There is a wide demand for such summarising numbers, and a 
good deal of this book will be devoted to considering them from one aspect 
or another. They give a convenient bird’s-eye view of what is sometimes 
a complex and confusing whole. Special sciences have evolved special 
quantities of this type to meet their own needs. For instance, the econo¬ 
mist has invented various kinds of index numbers to express in a short¬ 
hand way complicated changes in prices; and the psychologist has devised 
coefficients to express the reactions of an individual mind to a sequence of 
tests. 

0.31. The remarks wc made in 0.27 and 0.28 apply here with additional 
force. It must never be forgotten that in summarising we omit. Part of 
the statistician’s task is to see that we do not omit too much. 

0.32. The problem of describing a complicated set of data in as 
few terms as possible is facilitated by the use of mathematical functions. 
Suppose, for instance, that in the thousand men of 0.26 we assumed that 
the number of men (?/) of height x inches varied as the square of x — 
frankly a most improbable result., but one which will serve for the purposes 
of illustration. Then we may describe the data completely by an equation 
of the form 

y-ax* 

where a is a constant to be determined from the data. Knowing a y wc can 
find the number of men of any given height. 

0.33. In this ease it rather looks as if we have condensed all the 
information into a single number a without losing any of it. But that, is 
mot so. What we have done is to replace the set of a thousand figures by 
an assumption about their nature. We have lost none of the information 
because we assumed, in using the equation, that the information was of 
a type known to us already. 

0#34. It is found in practice that many sets of data may be very con¬ 
veniently expressed by mathematical functions. The question as to which 
functions are the most suitable for purposes of description leads to some 
interesting theory, some of which will be dealt with later and some of which 
is of an advanced character lying outside the scope of an Introduction to 
the Theory of Statistics. Such functions are particularly helpful in the 
theory of sampling. 

Analysis of Data. 

0.35. When the statistician has arranged and compressed his data into 
a suitable form, or decided on the functions and evaluated the quanti¬ 
ties which he has chosen to describe them, the first stage of his inquiry is 
finished. It may be that he would wish to take it no further ; for instance, 
if he is preparing an index number for the economist he may wish to hand 
over the number to that person without comment, for him to make such 
use of it as he thinks fit. More frequently, however, he has prepared the 



INTRODUCTION. 




data for his own use as a statistician, He then proceeds to the next stage, 
that of analysis and elucidation of the causal system which gave rise to 
them, 

0.36, The methods for such purposes are very numerous. In this 
brief review we need only point out the importance of the investigation of 
relatiomhip , the theory of which bulks very large in statistical literature. 
If two events are related there is usually, though not always, some causal 
nexus between them. The problems of the investigation of relationship 
between phenomena lead to the theory of dependence, contingency and 
correlation, and the formulation of various coefficients to measure the 
extent to which one set of events depends upon another. 

Sampling. 

0.37, When we wish to discuss the properties of an aggregate we may 
be prevented by practical or theoretical reasons from examining every 
single member of it. For example, m considering the stature of the male 
inhabitants of the United Kingdom we cannot measure every man, because 
of the time and trouble involved ; and in considering the scores of a roulette 
wheel we cannot examine every score, because the number is practically 
infinite and observations can be continued as long as the wheel lasts. 

0.38. We do not despair, nevertheless, of being able to gain some 
knowledge of the aggregate. Where we cannot take the whole we do the 
best we can and try to obtain a selection of members. This selection is 
called a sample. 

0.39. It is clear that a sample will not tell us everything about the 
parent aggregate from which it is derived. Nevertheless, most people have 
a feeling, and we shall see later in this book that under certain conditions 
the feeling is a justifiable one, that the sample will give us some information 
about the parent. Values calculated from the sample may be taken to be 
estimates of values in the parent, to a degree of approximation which 
becomes closer as the sample gets larger ; and even where the sample is 
small wc can sometimes draw inferences of a general nature about the 
parent. 

0.40. We are rarely, if ever, able to reason from the sample to the 
parent with the categorical certainty of a mathematical proof. Our 
inferences will usually be expressed in terms of probabilities. Moreover, 
we shall find it much easier to reject a hypothesis than to accept it. 
Our inferences will generally be not of the type “the hypothesis II 
is true,” or even “the hypothesis // is probably true,” but of the type 
“ hypotheses A, li and C are probably untrue, but we see no reason to 
doubt hypothesis //.” 

For example, suppose we take a sample of a thousand men from the 
population of the United Kingdom and find their average height to be 
f» ft. 8 in. What can we say about the average height of the population as 
a whole? We cannot give it with any certainty. W 7 e cannot even say, 
with certainty, that it lies within, say, one inch of 5 ft. 8 in. What we can 
say, assuming that the sampling technique is sound, will be something to 
the effect that a hypothesis which supposes that the mean of the whole 
population is greater than 5 ft, 9 in. or less than 5 ft. 7 in. is probably 
incorrect, but that the data are consistent with the supposition that the 
mean lies between those limits. „ 



10 


THEORY OF STATISTICS. 


0.41. The theory of sampling is thus closely bound up with the theory 
of probability. The many problems which arise in this connection are 
among the most interesting and at times the most difficult which science 
and philosophy can offer. It is only fair to warn the student that there 
still exists an important difference of opinion among scientific men about 
the validity of certain types of statistical inference. In this book we have, 
so far as we could, avoided these contentious matters, but the advanced 
student will have to be prepared to face them sooner or later. 

The Popular Attitude towards Statistics. 

0.42. Finally, to conclude this introduction we may, perhaps, refer 
to the popular mistrust of statistics and statistical methods. 

The layman’s attitude towards statistics is admirably summed up in 
the remark that mankind is divided into two parts, those who say that 
figures can prove anything and those who assert that they can prove 
nothing. It must be admitted that this attitude is not unreasonable. 
From the advertisement hoarding, from the electioneering platform, from 
the partisan press and from a dozen other sources the man in the street is 
bombarded with tendentious figures put forward to support some ex parte 
statement. Sometimes such figures are justifiably used to form a basis for 
the arguments which are built upon them ; more often they give a specious 
picture of the truth, which may be due to ignorance or inadvertence, but 
has also been known to be occasioned by a deliberate wish to mislead. 
The layman is well aware of this fact. His attitude in distrusting all 
arguments based on figures is that of a reasonable man, who has not the 
training to distinguish for himself the true from the false, and is therefore 
inclined to suspect everything. 

0.43, We are not concerned here with the vindication of statistics in 
the public view. Wc have alluded to the matter in order to remind the 
student that statistical methods arc most dangerous tools in the hands of 
the inexpert. Few subjects have a wider application ; no subject requires 
such care in that application. Statistics is one of those sciences whose 
adepts must exercise the self-restraint of an artist. 



CHAPTER 1. 


THEORY OF ATTRIBUTED-NOTATION AND 
TERMINOLOGY. 

Attributes and Variables. 

1 . 1 . The methods of statistics, as defined in the Introduction, deal 
with quantitative data alone. The quantitative character may, however, 
arise in two different ways. 

In the first place, the observer may note only the presence or absence 
of some attribute in a scries of objects or individuals, and count how many 
do or do not possess it. Thus, in a given population, we may count the 
number of the blind and seeing, the dumb and speaking, or the insane and 
sane. The quantitative character, in such cases, arises solely in the 
counting. 

In the second place, the observer may note or measure the actual 
magnitude of some variable character for each of the objects or indi¬ 
viduals observed, lie may record, for instance, the ages of persons at 
death, the prices of different samples of a commodity, the statures of men, 
the numbers of petals in flowers. The observations in these cases are 
quantitative ab initio . 

1.2. The methods applicable to the former kind of observations, 
which may be termed statistics of attributes, are also applicable to the 
latter, or statistics of variables. A record of statures of men, for 
example, may be treated by simply counting all measurements as tall that 
exceed a certain limit, neglecting the magnitude of any excess, and 
stating the numbers of tall and short (or more strictly not-tall) on the basis 
of this classification. Similarly, the methods that arc specially adapted to 
the treatment of statistics of variables, making use of each value recorded, 
are available to a greater extent than might at first sight seem possible for 
dealing with statistics of attributes. For example, we may treat the 
presence or absence of the attribute as corresponding to the changes of a 
variable which can only possess two values, say 0 and 1. Or, we may 
assume that we have really to do with a variable character which has been 
crudely classified, as suggested above, and we may be able, by auxiliary 
hypotheses as to the nature of this variable, to draw further conclusions. 
But the methods and principles developed for t he ease ii> which the observer 
only notes the presence or absence of attributes arc the simplest and most 
fundamental, and are best considered first. This and the next four 
chapters are accordingly devoted to the Theory of Attributes. 

Classification with reference to Attributes. 

1.3. The objects or individuals that possess the attribute, and those 
that do not possess it, may be said to be members of two distinct classes, 

11 



12 


THEORY OF STATISTICS. 

the observer classifying the objects or individuals observed. lit the 
simplest ease, where attention is paid to one attribute alone, only two 
mutually exclusive classes arc formed. If several attributes arc noted, 
the process of classification may, however, be continued indefinitely. 
Those that do and do not possess the first attribute may be reclassified 
according as they do or do not possess the second, the members of each of 
the sub-classes so formed according as they do or do not possess the third, 
and so on, every class being divided into two at each step. Thus the 
members of the population of any district may be classified into males and 
females ; the members of each sex into sane and insane ; the insane males, 
sane males, insane females and sane females into blind and seeing. If we 
were dealing with a number of peas (Pimm sativum) of different varieties, 
they might be classified as tall or dwarf, with green seeds or yellow seeds, 
with wrinkled seeds or round seeds, so that wo should have eight (‘lasses— 
tall with round green seeds, tall witli round yellow seeds, tall with wrinkled 
green seeds, tall with wrinkled yellow seeds, and four similar classes of 
dwarf plants. 

1.4L It may be noticed that the fact of classification docs not neces¬ 
sarily imply the existence of either a natural or a clearly defined boundary 
between the two classes. The boundary may be wholly arbitrary, e.g. 
where prices are classified as above or below some special value, barometer 
xe&dmgs as above or below some particular height. The division may also 
vague and uncertain : sanity and insanity, sight and blindness, pass into 
each other by such fine gradations that judgments may differ as to the 
class in which a given individual should be entered. The possibility of 
uncertainties of this kind should always be borne in mind in considering 
statistics of attributes : whatever the nature of the classification, however, 
natural or artificial, definite or uncertain, the linal judgment must be 
decisive ; any one object or individual must be held either to possess the 
given attribute or not. 

Dichotomy. 

1.5. A classification of the simple kind considered, in which each 
class is divided into two sub-classes and no more, has been termed by 
logicians classification, or, to use the more strictly applicable term, 
division by dichotomy (cutting in two). The classifications of most 
statistics are not dichotomous, for most usually a class is divided into 
more than two sub-elasscs, but dichotomy is the fundamental case. In 
Chapter 5 the relation of dichotomy to more elaborate (manifold, instead 
of twofold or dichotomous) processes of classification, and the methods 
applicable to some such eases, are dealt with briefly. 

1.6. For theoretical purposes it is necessary to have some simple 
notation for the classes formed, and for the numbers of observations 
assigned to each. 

* The capitals A, /i, C, . . . will be used to denote the several attributes. 
An object or individual possessing the attribute A will be termed simply 
A. The class, all the members of which possess the attribute A , will 
be termed the class A . It is convenient to use single symbols also, to 
denote the absence of the attributes A , B, C, . . . We shall employ the 
Greek letters a, /?, y, . . . Thus if A represents the attribute blindness , 
a represents sight , Le. non-blindness ; if B stands for deafness , jS stands 



A1TRIBUTE8—NOTATION AND TERMINOLOGY. 13 

for hearing . Generally H a ” is equivalent to u not-^/’ or an object or 
individual not possessing the attribute A ; the elms a is equivalent to the 
class none of the members of which possesses the attribute A. 

1*7. Combinations of attributes will be represented by juxtapositions 
of letters. Thus if, as above, A represents blindness , B deafness , A B 
represents the combination blindness and deafness. If the presence and 
absence of these attributes be noted, the four classes so formed, viz. AB , 
A/i , aB, a/3, include respectively the blind and deaf the blind but not-deaf 
the deaf but not-blind, and the neither blind nor deaf If a third attribute 
be noted* e.g . insanity, denoted say by C, the class ABC includes those 
who are at once deaf blind and insane , A By those who are deaf and blind 
but not insane , and so on. 

Any letter or combination of letters like A , AB , a/?, ABy , by means 
of which we specify the characters of the members of a class, may be 
termed a class symbol. 


Class -frequencies. 

1.8. The number of observations assigned to any class is termed, for 
brevity, the frequency of the (lass, or the class-frequency. Class- 
frcqueneics will be denoted by enclosing the corresponding class-symbols 
in brackets. Thus: 


(A) denotes number of -l’s, 

(a) 


(AB) 

„ AB% 

(a B) 

,, aB' s. 

(ABC) „ 

ABC' s, 

(aBC) „ 

„ aBC' r. 

(oftC) 

,, afiC'n, 


i.c. objects possessing attribute A 

,, not „ ,, A 

possessing attributes A and It 
„ attribute B but not A 

,, „ attributes A, B and C 

„ ,, „ B and C but not A 

„ „ attribute C but neither A nor /? 


and so on for any number of attribute's. If A represent, as in the illustra¬ 
tion above, blindness, B deafness, C insanity, the symbols given stand for 
the numbers of the blind , the not-blind , the blind arid deaf t he deaf but not- 
blind, the blind, deaf and insane , the deaf and insane but not-blind , and the 
insane but neither blind nor deaf respectively. 


Positive Attributes. 

1.9. The attributes denoted by capitals ABC . . . may be termed 
positive attributes, and their contraries denoted bv Greek letters negative 
attributes. If a class-symbol include only capital letters, the class may 
be termed a positive class ; if only Greek letters, a negative class. Thus 
the classes A , AB , ABC are positive classes; the classes a, a/3, a/Sy, 
negative classes. 

If two classes are such that every attribute in the symbol for the one 
is the negative or contrary of the corresponding attribute in the symbol 
for the other, they may be termed contrary classes and their frequencies 
contrary frequencies ; e.g. AB and a/3, Af 3 and a B, AfiC and a By, are 
pairs of contraries. 

1 . 10 . If we make a certain dichotomy with regard to a definite 
attribute A —such as male sex, blindness or blue eves —it may be of 
practical importance to note a possible distinction in the nature of the 
class not-^, The complementary class may, in fact, either be equally 






14 


THEORY OF STATISTICS. 

definite—female sex, ability to see—or it may be a mere heterogeneous 
remainder, as in our last instance—not-blue-cyed, the not-blue-eyed 
being brown-eyed, grey-eyed, or even possessing no eyes at all. 

Logically, this distinction is difficult to maintain, but practically it is 
of some importance. The statistical data in official returns are almost 
always classified according to positive and clearly defined attributes. 
For example, we are given the numbers of persons dying from typhoid, 
not the numbers who did not die of typhoid ; the number of acres under 
grass, not the number of acres not under grass. 

Order of Classes and Class-frequencies. 

1.11. The classes obtained by noting, say, n attributes fall into natural 
groups according to the numbers of attributes used to specify the respective 
(‘lasses, and these natural groups should be borne in mind in tabulating 
the class-frequencies, A class specified by r attributes may be spoken of 
as a class of the rth order and its frequency as a frequency of the rth 
order. Thus AB, AC, BC are classes of the second order; (^4), (A\ 8), 
(a BC), (A By I)), class-frequencies of the first, second, third and fourth 
orders respectively. 

Aggregates. 

1.12. The classes of one and the same order Tali into further groups 
according to the actual attributes specified. Thus if three attributes 
A , B, C have been noted, the classes of the second order may be specified 
by any one of the pairs of attributes AB, AC or BC (and their contraries). 
The series of classes or class-frequencies given by any one positive class 
and the classes whose symbols are derived therefrom by substituting 
Greek letters for one or more of the italic capital letters in every possible 
way will be termed an aggregate. Thus (AB), (Aj3), (a B), (a/3) form an 
aggregate of frequencies of the second order, and the twelve classes of the 
second order which can be lormcd where three attributes have been noted 
may be grouped into three such aggregates. 

1.13. Class-frequencies should, in tabulating, be arranged so that 
frequencies of the same order and frequencies belonging to the same 
aggregate are kept together. Thus the frequencies for the ease of three 
attributes should be grouped as given below, the whole number of observa¬ 
tions denoted by the letter N being reckoned as a frequency of order zero* 
since no attributes are specified. 


Order 0. 

N 



Order 1. 

(A) 

(») 

(O 


(a) 

(P) 

(y) 

Order 2. 

(AB) 

(AC) 

(BC) 


im 

(Ay) 

(By) 


(aB) 

(aC) 

(PC) 


(a/?) 

(ay) 

(, Py > 

Order 3. 


(ABC) 

(a BC) 


(A By) (aJSy) 

(A/3C) (apC) 

(Afiy) (o-Py) 


- ( 1 . 1 ) 



ATtMBOTES—NOTATION AND TERMINOLOGY. 


15 


TbyTokal Number of Class-frequencies. 

Vf.|4. In such a complete table for the case of three attributes, 
twenty-seven distinct frequencies are given : 1 of order zero, 6 of the first 
order, 12 of the second and S of the third. 

In general, for n attributes, there are S n d istinct class-frequencies, if we 
count N as a frequency of order 0. 

To demonstrate this, let us consider the number of classes of different 
orders. 

Of order 0 there is one class N . 

Of order 1 there are 2 n classes, for classes of this order contain only one 
symbol, and each of the n attributes contributes two symbols, one of the 
type A and one of the type a. 

— l) 

Of order 2 there are x2 2 classes, for each class contains two 


symbols, two attributes can be chosen from n in 


n(n ~ 1) 

~ 2 ~~ 


ways, and each 


pair gives rise to 2 2 different frequencies of the types (AB), (Aft), (aB) 
and (aft). 

Similarly, it may be seen that of order r there are 


«(w-l) ...(// ~r 4-1) 
r\ “ 

classes. 

Hence, the total number of class-frequencies is 




n(n - 1) . . . (n-i'tl) 

4- — - x 2 H 


and this is the binomial expansion of (1 -j 2) >J =-3 n . 

it is clear that if n is at all large the number of class-frequencies will be 
very great. For instance, if n - 6, the number is 729. 

1.15. Fortunately, however, the (‘lass-frequencies are not independent 
of one another, and it is not necessary, in order to specify the data com¬ 
pletely, to give every class-frequency. 

In the first place, let us note the simple result that any class-frequency 
can always be expressed in terms of class-frequencies of higher order. For 
the whole number of observations must clearly be equal to the number of 
A 9 s added to the number of a’s, i.e. 

N - (A) + (a) .... (1.2) 

Similarly, the number of A' s is equal to the number of A 9 s which are 
JTs added to the number of A' s which are /3’s, i.e. 

(A)~(AB) + (Ap) .... (1.3) 

Similarly, 

(AB)-(ABC) + (ABy) . . . (1.4) 

and so on. 

Ultimate Class-frequencies. 

1.16. It follows at once from the result we have just given that every 
class-frequency can be expressed in terms of the frequencies of the highest 



1$ THEOHT OF STATISTICS. 

order, i.e, of order n. For any frequency can be analysed into higher 
frequencies, and the process need only stop when we have reached the 
frequencies of highest order. For example, with three attributes, 

(A)-(AB) + (Ap) 

— (ABC) + (ABy) 4- (AfiC) 4- (Afiy) 

The classes specified by n attributes, i.e, those of the lijghcst order, are 
termed the ultimate class-frequencies. 

Our result may then be expressed in the form : Every class-frequency 
can be expressed as the sum of certain of the ultimate class frequencies. To 
Specify the data completely it is, therefore, only necessary to give the 
ult i mate class - f requen ei es. 

Example LL —(See ret’. (09).) A number of school-children were ex¬ 
amined for the presence or absence of certain defects of which three chief 
descriptions were noted : A , development defects ; B , nerve signs ; C, low 
nutrition. 

Given the following ultimate frequencies, lind the frequencies of the 
positive classes, including the whole number of observations N: — 


{ABC) 

57 

(aBC) 

78 

(ABy) 

281 

(aBy) 

670 

(ApC) 

86 

(*PO 

65 

(AM 

453 

Upy) 

8310 


The whole number of observations N is equaj to the grand total : 
N- 10,000. 

The frequency of any first-order class, c.g. (A), is given by llie total of 
the four third-order frequencies the elass-sv mbols for which contain the 
same letter: 

(ABC) h(ABy) 4- (AfiC) \ (Apy) (A)-877 

Similarly, the frequency of any second-order class, c.g. ( AB ), is given 
by the total of the two third-order frequencies the class-symbols for which 
both contain the same pair of letters: 

(ABC) I (ABy)-(AB)^.888 

The complete results arc; 


N 

10,000 

(AB) 

338 

(A) 

877 

(AC) 

3 43 

(B) 

1,086 

(BC) 

135 

(C) 

286 

(ABC) 

57 


The Number of Ultimate Class-frequencies. 

1.17. The class-frequencies of highest order each contain ^symbols. 
Now each letter corresponding to a particular attribute may be written 
in two ways; A or a, B or /?, etc. Hence the total number of possible 
symbols is 

2x2x2x2x2x2x2x , , . — 2 W 

and this is the number of ultimate class-frequencies. 

Hence the 3 n frequencies ma}^ all be expressed in terfhs of the 2 n 
ultimate frequencies. For example, if n~6, the 729 frequencies can he 



ATTRIBUTES—NOTATION AND TERMINOLOGY. 


17 

written in terms of 64 ultimate class-frequencies, winch specify the data 
completely. , 

Fundamental Sets. 

1.18. The ultimate frequencies are, however, not the only set which 
specify the whole of the data. In fact, any set will serve the purpose 
provided tha%(a) they are 2 n in number, and (b) they are algebraically 
independent; that is to say, when they are written symbolically no one can 
be expressed in terms of some or all of the others. 

We may call such a set of frequencies a fundamental set. 


The Positive Class-frequencies form a Fundamental Set. 

1.19. The positive class-frequencies, including under this head the 
total number of observations N , form one such set. They arc algebraically 
independent ; no one positive class-frequency can be expressed wholly 
m terms of the others. Their number is, moreover, 2 W , as may be readily 
seen from the fact that if the Greek letters are struck out of the symbols 
for the ultimate classes, they become the symbols for the positive classes, 
with the exception of afiy . . . for which N must be substituted. Alter¬ 
natively we may, in the manner of 1.14, prove the result by considering 
the number of positive class-frequencies of each order. The number is 
made up as follows : — 


Order 0. 
Order 1. 

Order 2. 

Order 3. 


(The whole number of observations) ... 1 

(The number of attributes noted) . . . n 

fi( ^ |) 

(The number of combinations of n things 2 together) v - 

L . Z 


(The number of combinations of n things 3 together) 

n(n - l)(w - 2) 
1.2.3 


and so on. But the series 


n{n - 1) n(n~\)(n ~ 2) 

W+ 1.8 + 1 . 2.8 + 


is the binomial expansion of (1 H l) w or 2 n ; therefore the total number of 
positive classes is 2 n . 

1.20. The set of positive class-frequencies is a most convenient one 
for both theoretical and practical purposes. 

Compare, for instance, the two forms of statement, in terms of the 
ultimate and the positive classes respectively, as given in Example 1.1. 
The latter gives directly the whole number of observations and the totals 
of A* s, B\ and C’s. The former gives none of these fundamentally 
important figures without the performance of more oi less lengthy additions. 
Further, the latter gives the second-order frequencies ( AB ), (AC) and 
( BC ), which arc necessary for discussing the relations subsisting between 
A , B and C, but are only indirectly given by the frequencies of the ultimate 
classes. 

1.21. We are now able to indicate the applications of the foregoing 
analysis to some practical problems. 


2 



18 TtfFORY OF STATISTICS. 

The typical problem which arises in this connection is the following: 
Given certain class-frequencies, to find them all. 

In the first place, we may remark at once that unless 2 W independent 
class-frequencies are given the problem is insoluble. We might be able 
to find some of the frequencies, but it is certain that we could not find 
every one. We shall reserve to a later chapter the consideration of what 
can be done with such incomplete data. In the examples q£ this chapter 
we shall deal only with data which specify the problem completely. 

^Ecrample 1.2.— Given the positive class-frequencies of Example 1.1, to 
find all the class-frequencies. 

The data are: 

N «*} 0,000 ; (A) = 877 ; (B)-108G; (C)-*286; (AB)=3 38; 

(AC) = 143 ; (BC) *135 ; (ABC)=5 7. 

We have: 

(AB) — (ABy) 4- (A BC) 
or 

838 = (ABy) 4-57 

i.e . 

(ABy) —281 

Similarly, from (AC) and (BC) we find: 

(AfiC) -80 
(aBC) - 78 

This gives us the three ultimate class-frequencies which contain only 
one Greek letter. For the others, 

(ap(') - (fiC) - (A/3C) 

= (C)~(nc)-(ApC) 

- 280 - 135 - 80 
-=05 

Similarly, wc have: 

(Afty) 153 
(aBy) —070 

Finally, 

(a/3y) - (Py) - (Apy) 

^(y)-(By) -(Apy) 

-N - (C) -{(B) - (BC)} - (Apy) 

-= 10,000 - 286 - 951 - 453 
-8310 

We can now calculate any class-frequency by expressing it in terms of 
the ultimate class-frequencies, e.g. 

(ay)- (aBy) 4* (a fly) 

-(370 + 8810 
8980 

It is, of course, also possible to calculate these frequencies by expressing 
them directly in terms of the given frequencies, e.g. 



'''■'"V'/:;' ATTJaBUTES—NOTATION AND TERMINOLOGY. 18 

(ay) « (y) - (Ay) 

-A }-(C)-{(A)-(AC)) 

= 10,000 - 286 - 877 +148 
= 8980 

' jtwOinple 1.3. —In a free vote in the House of Commons, 600 members 
voted. 300 Government members representing English constituencies 
(including Welsh) voted in favour of the motion. 25 Opposition members 
representing Scottish constituencies voted against the motion. The 
Government majority among those who voted was 96. 135 of the 

members voting represented Scottish constituencies. 38 Government 
members voted against the motion. 102 Scottish members voted in 
favour of the motion. The motion was carried by 310 votes. Analyse 
the voting according to the nationality of the constituencies and party. 


Denoting the Government and Opposition parties by A and a respec¬ 
tively, voting for and against the motion by B and /?, and English and 
Scottish members by C and y respectively, our data, in the order of the 
question, are : 


therefore expect them to give us the eight ultimate classes. 
(h) and ( c) already give us two. 

From (a) wc have: 


Equations 


From (d ): 
lienee, 


= M ) + («)■ 

(A) - (a) 


Similarly, from (a) and (h) we obtain: 

(B) =-455 . 

(j3)-145 . 

From (a) and (e) we have: 

(C)-A~(y)-*05 . 

We have thus found all first-order frequencies. 
(i) and (/) give 

(AB)-(A)-(Afi) 

= 380 


(m) 

(«) 







20 


THEORY OF STATISTICS. 


(k) and (g) give 
We also have: 


(BC)-(B)-(JBy) 
=■- 858 


(0) 


(afiy)^(fly)-{Apy) 

- (y) - (By) - \(A) - (AC) ~~ (AB) f (ABC)} 

and substituting the known values on the right and the value of (a/3y), 
we have 


25 - 185 - 102 -848 + (AC) +830 -300 



(AC)~ 310 

• (P) 

From (n) and (b) 

w r e get 



(ABy) - (AB) - (ABC) 30 

• (?) 

From (o) and (b) we get, similarly. 



(aBC')-53 . 

• (r) 

From (p) and ( b ) 

we get 


(ApC)-l 0. 

• (*) 

From (e) and (g) : 



Hence, 

(fly) — (y) - (By)-88 

{Afiy) ~ (Py) - (a/3y) —8 

(t) 

From (/) and ( l ): 

Hence, 

(a£) = 127 

(apC) — (aB)- (afiy) 

= 102 

(u) 


Finally, N =sum of ultimate class-frequencies, and this gives 


(aBy) --72. (v) 

This straightforward but rather heav}^ analysis has therefore given us 
the eight ultimate class-frequencies in equations ( b ), (c), (</), (r), (,y), (t), 
(w) and (v). 

1.22. The data encountered in practice are rarely dichotomised 
according to more than three or four variables, and the student should 
experience little difficulty in expressing any class-frequency in terms of 
the known class-frequencies, either directly, or by first finding the ultimate 
class-frequencies and then expressing the desired frequency in terms of 
them. 

It is, however, interesting to note the general result that the class 
symbols can be treated as operators and multiplied together like algebraical 
quantities. Let us write A . N for the operation of dichotomising N 
according to A , and write 

A.N^(A) 

which is the symbolic way of saying that if we dichotomise N according to 
A we get a class-frequency equal to (A), We can similarly put 

a • N — (a) 



ATTRIBUTES—NOTATION AND TERMINOLOGY* 21 

Adding these two, and putting A . N + a . N equal to (A + a). N, we have; 

(A+a). N~N 

so that we may take 

A -I a -1 

In any symbolic expression we can therefore replace the operators A or a 
by 1 - a, 1 - A, respectively. 

Furthermore, since (AB)~A . (B) * B . (A), we may take the symbol 
AB . N to be the dichotomy of N according to both A and B , and equate 
it to (AB), A little reflection will show that the operative symbols there¬ 
fore obey the ordinary laws of algebra and in particular may be multiplied 
together. 

For example, we have : 

(aj8)»aj8. AT-- (1 - A)(l - B) . N 
-(1 -A -B + AB) . N 

-N-(A)-(B)+(AB) . . . . (1.5) 

And, similarly, 

(aBy) = a By . N 

~(1-A)(1-B)(1-C).N 

- (1 - A - B - C f AB + BC + AC ~ ABC) . N 

-N-(A) -(B)-(C) +(AB) h(AC)+(BC)-(ABC) . . (1.6) 

Similar results could, of course, be obtained by step-by-step sub¬ 
stitution ; for instance, 

(a/3) — (a) - (aB) 

--N - (A) - (B) i (AB) 

1.23. The symbolism we have discussed in this chapter is also of use 
in deducing results of a less definite character expr< ssrblc by inequalities. 

'^Example 1A .— In a war between White and Red forces there are more 
Red soldiers than White ; there arc more armed Whites than unarmed 
Reds ; there are fewer armed Reds with ammunition than unarmed Whites 
without ammunition. Show that there are more armed Reds without 
ammunition than unarmed Whites with ammunition. 

Writing A to denote the property of being a White soldier, and hence a 
to denote the property of being a Red soldier ; writing B and /8 to denote 
armed and unarmed, respectively; and writing* C and y to denote the 
possession or non-possession of ammunition, respectively, our data are : 

(a)'(A) . (a) 

(AB) > (aft) .... (5) 

(Afiy) > (aBC) . (c) 

We have to show that 

(aBy) > (A/3C) 

From (a), considering the dichotomy of eaeli side according to /i, we 
have; 

(aB) + (a/3) > (AB) + (A0) 



22 THEORY OR STATISTICS. 

Substituting for (AB) from (h) in this inequality* 


(aB) + (aft) > (aft) + (Aft) 

and hence, 

(aB) > (Aft) 

From this, considering the dichotomy of each side according to C, we 
(aBC) + (aBy) > (AftC) *-(Afty) 
and in virtue of (c) this gives 

(aBy) > (AftC) 

which is the required result. 

1.24. The symbols of our notation are, it should be remarked, used 
in an inclusive sense, the symbol A , for example, signifying an object or 
individual possessing the attribute A with or without others. This seems 
to be the only natural use of the symbol, but at least one notation has been 
constructed on an exclusive basis, the symbol A denoting that the object 
or individual possesses the attribute A , but not B or C or D , or whatever 
other attributes have been noted. An exclusive notation is apt to be 
relatively cumbrous and also ambiguous, for the reader cannot know what 
attributes a given symbol excludes until he has seen the whole list of 
attributes of which note has been taken, and this list he must bear in mind. 
The statement that the symbol A is used exclusively cannot mean, 
obviously, that the object referred to possesses only the attribute A and no 
others whatever ; it merely excludes the other attributes noted in the 
particular investigation. Adjectives, as well as the symbols which may 
represent them, are naturally used in ail inclusive sense, and care should 
therefore be taken, when classes are verbally described, that the description 
is complete, and states what, if anything, is excluded as well as what is 
included, in the same way as our notation. The terminology of some tables 
in our older English census has not, in this respect, been quite clear. The 
“ Blind ” includes those who are “ Blind and Dumb,” or u Blind, Dumb 
and Lunatic,” and so forth. But the heading “ Blind and Dumb,” in the 
table relating to “ combined inlirmities,” is used in the sense “ Blind and 
Dumb, but not Lunatic or Imbecile,” etc., and so on for the others. In 
the first table the headings are inclusive, in the second exclusive. 


SUMMARY. 

1, A collection of individuals may be divided into two classes according 
to whether they do or do not possess a particular attribute. This process 
is called dichotomy. 

2. Continued dichotomy according to n attributes gives rise to 3 n 
classes. 

8. The frequencies in these classes can be expressed in terms of the 2 n 
ultimate class-frequencies, or of the 2 n positive class-frequencies. 

4, Given 2 n independent class-frequencies, all the class-frequencies may 
be calculated by simple arithmetical processes. 



ATTRIBUTES—NOTATION AND TERMINOLOGY* 


23 


EXERCISES. 


1.1* (Figures from ref. (09).) The following are the numbers of boys observed 
with certain classes of defects amongst a number of school-children. A denotes 
development defects; B, nerve signs; C, low nutrition* 


(ABC) 

149 

(aBC) 

204 


(ABy) 

738 

(aBy) 

1,762 


(AfiC) 

225 

(apC) 

171 


(AM 

1,196 

(apy) 

21,842 


Find the frequencies of the positive classes. 



1.2. (Figures from ref. (69).) The 

following arc 

j the frequencies of the 

positive classes for the girls in the same investigation: 

: - 


N 

23,713 

(AB) 

587 


(A) 

1,618 

(AC) 

428 


(B) 

2,015 

(BC) 

335 


(C) 

770 

(ABC) 

156 


Find the frequencies of the ultimate classes. 



1.3. (Figure's from Census, England 

and Wales, 

1891, vol* 

3.) Convert 

the census statement as 1 

below into a statement m terms of (a) 

the positive, 

(h) the ultimate class ■'frequencies. A = blindness, B- 

deaf-mutism, C — mental 

derangement. 





N 

29,002,525 

(ABy) 

82 


(A) 

23,167 

(A [1C) 

380 


(B) 

14,192 

(aBC) 

500 


(C) 

97,383 

(ABC) 

25 



¥■ 1.4. (Cf. Mill’s “ IsOgic” bk.3, eh. 17, and ref. (65) ) Show that if A occurs 

m a laiger proportion of the cases where B is than where B is not, then B will 
occur m a larger proportion of the cases where A is than where A is not: i.e. 
given (AB)f(B) (Ap)l(p), show that (AB)j(A) ^ ( aB)/(a ). 

1.5. (('}. De Morgan, " Formal Logic," p. 166, and ref. (65).) Most B\ are 
A\, most B\ are CN: find the least number of-IN that aie CN, t.e. the lowest 
possible value of (AC). 

3.6. Given that 

(A)^(a)-(B) (p) \N 

show that 

(AB) (ap), (AP) (a B) 

1.7. (Cf. ref. (78), Section 9, ‘‘Case of equality of contrary s.”) Given that 


and also that 
show that 


(A) ~(a)~(B) (fl)-(C) ~(V)-IX 
(ABC) = ( apy) 

2(ABC) — (AB) } (AC)+(BC) - \NS 


1.8. Measurements are made on a thousand husbands and a thousand wives. 
If the measurements of the husbands exceed the measurements of the wives in 
800 eases for one measurement, in 700 eases for another, and in 660 eases for 
both measurements, in how many cases will both measurements on the wife 
exceed the measurements on the husband ? 

1.9. 100 children took three examinations. 40 passed the first, 39 passed 
the second and 48 passed the third* 10 passed all three, 21 failed all three, 
9 passed the first two and failed the third, 19 failed the first two and passed the 
third. Find how many children passed at least two examinations. 



,24 


THEORY OF STATISTICS. 


Show that for the question asked Certain of the given frequencies are not 
necessary. Which are they ? 

Show further that the data are not sufficient to permit of the determination 
of the ultimate class-frequencies. 

3 .10. (Lewis Carroll, “ A Tangled Tale ,” 1881.) In a very hotly fought battle 
70 per cent, at least of the combatants lost an eye, 75 per cent, at least lost an 
ear, 80 per cent, at least lost an arm and 85 per cent, at least lost a leg. How 
many at least must have lost all four ? 

1.11. Show that for n attributes A, B, C } . . . M, 

(ABC . . .M) > f(4)+(B)+(C)4 . • • +(M)} -(n-l)N 
where N is the total frequency; and hence generalise the result of Exercise 1.10. 



CHAPTER 2. 


CONSISTENCE OF DATA. 

Universe of Discourse. 

2.1. Any statistical inquiry is necessarily confined to a certain time, 
space or material. An investigation on the prevalence of unemployment, 
for instance, may be limited to England, to England in 1931, to English 
males in 1931, or even to English males over 50 years of age in 1981, 
and so on. 

For actual work on any given subject, no term is required to denote 
the material to which the work is so confined : the limits are specified, 
and that is suflioient. But for theoretical purposes some term is almost 
essential to avoid circumlocution. The expression the universe of 
discourse, or simply the universe, used in this sense by writers on 
logic, may be adopted as familiar and convenient. 

2.2. The universe, like any class, may be considered as specified 
by an enumeration of the attributes common to al! its members; e,g. 
taking the illustration of 2.1, those attributes implied by the predicates 
English , tmale , over 50 years of age , living in 1931. It is not, in 
general, necessary to introduce a special letter into the class-symbols to 
denote the attributes common to all members of the wniverse. We know 
that such attributes must exist, and the common symbol can be under¬ 
stood. 

In strictness, however, the symbol ought to be written : if, say, U 
denote the combination of attributes, English — male over 50 -living in 
1981, A unemployed, B married, we should strictly use the symbols: 

(IJ) -- Number of English males over 50 living in 1981 
(UA) ,, unemployed English males over 50 living in 1981 

(Vli) married „ „ „ 

( UAB) „ unemployed and married English males over 50 

living in 1931 

instead of the simpler symbols A 7 , ( A ), ( B ), {AB). Similarly, the general 
relations of equations (1.2), (1.8) and (1.4), page 15, using U to denote the 
common attributes of all the members of the universe and (V) consequently 
the total number of observations N, should in strictness be written in the 
form: 

(U)*(UA) + (Ua)*-{UB) 4 (UjS) - etc. 

- (UAB) + (UAfi) 4 (UaB) + (!7aj3) =-- etc. 

(UA) ** (UAB) + (UA jS) = (UAC) + UAy) - etc. 

(UAB) ** (UABC) + (UABy) - etc. 

25 



26 THEORY OF STATISTICS, 

Specifying the Universe. 

2.3. Clearly, however, we might have used any other symbol instead 
of V to denote the attributes common to all the members of the universe, 
e.g. A or B or AB or ABC , writing in the latter case: 

(ABC) ~(ABCD) 4 (ABCS) 

and so on. Hence any attribute or combination of attributes common to all 
the class-symbols in an equation may be regarded as specifying the universe 
within which the equation holds good , Thus the equation just written may 
be read in words : 44 The number of objects or individuals in the universe 
ABC is equal to the number of D’s together with the number of not-iTs 
within the same universe.” The equation 

(AC) = (ABC) + (ApC) 

may be read : 44 The number of A' s is equal to the number of A's that are 
JTs together with the number of A’s that are not-JTs within the universe C." 

2.4. The more complex relations between class-frequencies may be 
derived from the simpler ones very readily by the process of specifying 
the universe. Thus, starting from the simple equation 

(a)=N-(A) 

we have, by specifying the universe as jS, 

(a0)-(|B)-MP) 

-*N-(A)-(B) + (AH) 

Specifying the universe, again, as y, we have: 

(apy)«~(y)-(Ay)-(By)+(ABy) 

-(A)- (B) - (C) + (AB) + (AC) + (BC) - (ABC) 

Consistence. 

v 2.5. Any class-frequencies which have been or might have been 
observed within one and the same universe may be said to be consistent 
with one another. They conform with one another, and do not in any 
way conflict. 

The conditions of consistence are some of them simple, but others are 
by no means of an intuitive character. Suppose, for instance, the following 
data are given :— 


N 

1000 

(AB) 

42 

(A) 

525 

(AC) 

147 

(B ) 

312 

(BC) 

86 

(C) 

470 

(ABC) 

25 


—there is nothing obviously wrong with the figures. Yet they are 
certainly inconsistent. They might have been observed at different 
times, in different places or on different material, but they cannot have 
been observed in one and the same universe. They imply, in fact, a 
negative value for (afiy): 

(apy) -1000 - 525 - 312 - 470 + 42 +147 + 86 - 25 
-1000 -1307+275 -25 
- -57 



CONSISTENCE OF DATA, 27 

Clearly no class-frequency can be negative. If the figures, conse¬ 
quently^ are alleged to be the result of an actual inquiry in a definite 
universe, there must have been some miscount or misprint. 

Condition for Consistence, 

2.6. It is, in fact, the necessary and sufficient condition for the 
consistence of a set of independent class-frequencies that no ultimate 
class-frequency be negative. It is necessary for the obvious reason that 
no class-frequency occurring by counting real attributes can be negative; 
it is sufficient because, given any non-negative set of 2 n numbers, we can 
always imagine a real universe with n dichotomies which should have these 
numbers for its ultimate class-frequencies, and it is impossible for this real 
universe to give inconsistent results. 

Hence to test the consistence of a set of 2 n algebraically independent 
class-frequencies we need only calculate the ultimate elass-frequeneies and 
ascertain whether any one is negative. If it is, the data are inconsistent. 
If no ultimate frequency is negative, the data are consistent. 

Consistence of Positive Class-frequencies. 

2.7. For data given by a heterogeneous collection of class-frequencies, 
consistence is best tested by actually calculating the ultimate frequencies. 
We saw in the last chapter, however, that the positive class-frequencies 
hold a peculiar position in that many data encountered in practice are 
given entirely in terms of them alone. To save the trouble of calculating 
the ultimate frequencies from them, we proceed to discuss the form which 
the consistence conditions assume when expressed entirely in terms of the 
positive elass-frequeneies. These conditions may be expressed symboli¬ 
cally by expanding the ultimate in terms of the positive frequencies, and 
writing each such expansion not less than zero. We will consider the cases 
of one, two and three attributes in turn. 

2.8. If only one attribute be noted, say A , the positive frequencies 
are N and (A). The ultimate frequencies are (A) and (a), where 

(a) ~ (A) 

The conditions of consistence are therefore* simply 
(i)'iO N - (A) -i 0 
or, more conveniently expressed, 

(a) (A)< 0 (■ b) (A) > N . . (2.1) 

These conditions are obvious: the number of A’s cannot be less than 
zero, nor exceed the whole number of observations. 

2.9. If two attributes be noted there are four ultimate frequencies 
( AB\ ( Afi ), (a#), (a/J). The following conditions are given by expanding 
each in terms of the frequencies of positive classes - 

(a) (AB) < 0 or (AB) would be negative 

(b) (AB)< (A) + (B)-N „ (aj8) 

(c) (AB) > (A) „ (Ap) 

(d) (AB) * (B) „ (a B) 


( 2 . 2 ) 



28 


THEORY OF STATISTICS. 


(a), (c) and (d) are obvious; (b) is perhaps a little less obvious, and W 
occasionally forgotten. It is, however, of precisely the same type as the' 
other three. None of these conditions is really of a new form, but may be 
derived at once from (2.1) (a) and (2.1) (/;) by specifying the universe as B 
or as ft respectively. The conditions (2.2) are therefore really covered 
by (2.1). 

2.10. But a further point arises as regards such a system of limits as 
is given by (2.2). The conditions (a) and (b) give lower or minor limits to 
the value of (AB); (c) and ( d ) give upper or major limits. If either major 
limit be less than either minor limit the conditions are impossible, and it. is 
necessary to see whether (A) and (B) can take such values that this may 
be the case. 

Expressing the condition that the major limits must be not less than 
the minor, we have: 

(A)<0\ ( B) <}: 0 1 

M)>2VJ (B)>N\ 

These arc simply the conditions of the form (2.1). If, therefore, (A) and 
(B) fulfil the conditions (2.1), the conditions (2.2) must be possible. The 
conditions (2.1) and (2.2) therefore give all the conditions of consistence 
for the case of two attributes, conditions of an extremely simple and 
obvious kind. 

2.11. Now consider the case of three attributes. There are eight 
ultimate frequencies. Expanding the ultimate in terms of the positive 
frequencies, and expressing the condition that each expansion is not less 
than zero, we have : 

oi the frequency gi\en below 
will be negative 


(a) (ABC) 

< o 

(ABC )' 

(b) 

+ (AB) + (AC)-(A) 

MlM 

(o) 

< (AB) + (BC) - (B) 

(a By) 
(«J8 C) 

(d) 

■l ( AC)+(BC)-(C) 

<«) 

> (AB) 

(ABy) 

(/) 

* (AC) 

(ApC) 

(g) 

> (BC) 

( aliC ) 

(h) 

* (AB )-1 (AC) i (BC) - 

(A)-(B)-(C)iN ( a p y ) J 


These, again, are not conditions of a new form. We leave it as an 
exercise for the student to show that they may be derived from (2.1) (#) 
and (2.1) (b) by specifying the universe in turn as BC , By, /3C and jiy. 
The two conditions holding in four universes give the eight inequalities 
above. 

2.12. As in the last case, however, these conditions will be impossible 
to fulfil if any one of the major limits (c)~(h) be less than any one of the 
minor limits ( a)-(d ). The values on the right must be such as to make 
no major limit less than a minor. 

There are four major and four minor limits, or sixteen comparisons in 
all to be made. But twelve of these, the student will find, only lead back 
to conditions of the form (2.2) for (AB), (AC) and (BC) respectively. 
The four comparisons of expansions due to contrary frequencies ((a) and 
( h ), (b) and (g), (c) and (/), (d) and (e)) alone lead to new conditions, viz. 



CONSISTENCE OF DATA. 


29 


(a) {AB) + (JC) + (BC) < (A) + (B) + (C)-N 

(b) (JB) + (JC)-(BC) > (A) 

(c) (AB)-(AC)+(BC) > (B) 

(d) ~(AB) + (AC) + (BC) > (C) 

2.13. These are conditions of a wholly new type, not derivable in any 
way from those given under (2.1) and (2.2). They are conditions for the 
consistence of the second-order frequencies with each other , whilst the in¬ 
equalities of the form (2.2) arc only conditions for the consistence of the 
second-order frequencies with those of lower orders. Given any two of the 
second-order frequencies, c.g. (AB) and (AC), the conditions (2.4) give 
limits for the third, vi/„ ( BC ). 

Incomplete Data. 

2.14. We can now take up a question which we set aside in Chapter 1, 
namely, that of the inferences which may be drawn from data which, though 
giving us a certain amount of information m the shape of class-frequencies, 
yet are insufficient to enable us to calculate all the class-frequencies. 

The form of the consistence conditions (2.4) shows that a knowledge of 
certain class-frequencies allows us to assign limits to others, even though 
we may not be able to find the actual values of those others. The following 
will serve as illustrations of the statistical uses of the conditions : — 



sKoaample 2./.—Given that (A) (B)-(C) \N and 80 per cent, of 

the A\ are If s, 75 per cent, of A\ are (7s, find the limits to the percentage 
of that are C s. 


The data are: 


'l(AB) 2 (AC) 

N ’° 8 N ' 


and the conditions (2.4) give: 


w T 1 

4 

1 

0-8 -0-75 

(b) 

4 

0 8 +0-75 -1 

(c) 

t> 

1 

-0 8 f 0 75 

(d) 

> 

1 

) 0-8 0-75 


(a) gives a negative limit and ( d) a limit greater than unity ; hence they 
may be disregarded. From (b) and (e) vve have: 

2 (BC) 2 (BC) 

N 1 ° ,,J A 7 


4> 0-95 


—that is to say, not less than 55 per cent, nor more than 95 per cent, of 
the /7s can be (7s. 

^Example 2.2.~ If a report gives the following frequencies as actually 
observed, show that there must be a misprint or mistake ot some sort, and 
that possibly the misprint consists in the dropping of a 1 before the 85 
given as the frequency (BC ):— 

N 1000 


(A) 

510 

{AB) 

189 

(B) 

490 

(AC) 

140 

(c) 

427 

(BC) 

85 



30 ' THEORY OF STATISTICS. 

From (2*4) (a) we have: 

(BC) < 510 + 490 + 427 -1000 -189 -140 
< 98 

But 85 < 98, therefore it cannot be the correct value of (BC), 

K'We read 185 for 85 all the conditions are fulfilled. 

^Example 2.3 .—In a certain set of 1000 observations (.4) -45, (B) = 28, 
(C)*=14. Show that whatever the percentages of B ’s that are A's and of 
C’s that are A’s, it cannot be inferred that any B’s are C’s. 

The conditions (2.4) (a) and (b) give the lower limit of (BC), which is 
required. We find: 

(a) JAB) (AC) 


(4) 


N 

(BC) 

N 


N 

JB) 

N + 


N 

(AC) 

N 


0*918 


0045 


The first limit is clearly negative. The second must also be negative, 
since (AB)jN cannot exceed 0*028 nor (AC)jN 0*014. Hence we cannot 
conclude that there is any limit to (BC) greater than 0. This result is 
indeed immediately obvious when we consider that, even if all the ZJ’s 
were A’s , and of the remaining 22 A’s 14 were C’s, there would still be 
8 A’s that were neither B’s nor C’s. 

2.15. The student should note the result of the last example, as it 
illustrates the sort of result at which one may often arrive by applying the 
conditions (2.4) to practical statistics. For given values of N, (A), (B), 
(C), (AB) and (AC), it will often happen that any value of (BC) not 
less than zero (or, more generally, not less than either of the lower limits 
(2.2) (a) and (2.2) (b)) will satisfy the conditions (2.4), and hence no 
true inference of a lower limit is possible. The argument of the type 
u So many A’s are B ’s and so many B ’s are C’s that we must expect some 
A’s to be C’s ” must be used with caution. 

2.16. Where the data are not given in terms of the positive or of 
the ultimate class-frequencies, and cannot readily be thrown into such a 
form, the device illustrated in the following example is often useful:— 

O^Jccample 2.4 .—Among the adult population of a certain town 50 per 
cent, of the population are male, 60 per cent, are wage-earners and 
50 per cent, are 45 years of age or over. 10 per cent, of the males are 
not wage-earners and 40 per cent, of the males are under 45. Can we 
infer anything about what percentage of the population of 45 or over 
are wage-earners ? 


Denoting the attributes male, wage-earner and 45 years old or more 
by A, B and C, respectively, and letting iV = 100 for convenience, our 
data are: 

(A) =50 
(£)=6 0 
(C) = 50 
(AP)= 5 
(Ay) =20 



31 


CONSISTENCE OF DATA, 

We require the limits, if any, of (BC). 

Let m note first of all that we are given 6 class-frequencies (including 
N). If we knew two more, independent of these 6, the problem would be 
completely determinate, for we should have 2 s class-frequencies. 

Let us therefore put 

(afiy)=X 

(ABC) =y 

We can then solve for the ultimate class-frequencies and get 

(ABy) = 45 - y 
(A/3C) =30 - y 
(aBC) - x - 15 
(Afiy) = y -25 
(a/?y) =80 - X 
(aftC) =85 - x 

The condition that these must be non-negative gives us conditions on 
x and y. In fact, from (aBC) and (aBy) we get 

15 4- x 4* 30 

and from (AflC) and (Afly), 

25 > y 4* 30 

the conditions from the other frequencies being included in these limits 
to x and y . 

Now (BC) - (ABC) + (aBC) 

~^y +x -15 

and hence, from the limits to x and y, 

25 > (BC) > 45 

Consequently, the percentage of the population 45 years old or more 
(50 per cent, of the total population) who are wage-earners lies between 
50 and 90 per cent. 

It is worth while examining whether these limits are the narrowest 
possible which can be assigned with the available data ; and it is easy to 
see that they are. For if x = 15 and y = 25, (BC) =25 ; and if x =30 and 
y =80, (BC)~ 45. There is nothing in the conditions of the problem to 
prevent x and y , and hence (BC), from reaching the limiting values, and 
thus no narrowing of the limits is possible. 


SUMMARY. 

1. The necessary and sufficient condition for the consistence of a set 
of independent class-frequencies relating to a particular universe is that no 
ultimate class-frequency which may be calculated from them is negative. 

2, In view of the practical importance of the positive class-frequencies, 
the form of the consistence conditions is expressed solely in terms of such 
frequencies. 

8. The conditions may be applied to the examination of inaccurate 
or incomplete data. For the latter they may allow us to assign limits to 
an unknown class-frequency. 



THE OBY OF STATISTICS. 


9% 


EXERCISES. 

2.1, (For this and similar estimates cf >“ Report by Miss Collet on the Statistics 
of Employment of Women and Girls” [C.—7564], 1894.) If, in the urban district 
of Bury, 817 per thousand of the women between 20 and 25 years of age were 
returned as “occupied” at the census of 1891, and 268 per thousand as married 
or widowed, what is the lowest proportion per thousand of the married or 
widowed that must have been occupied ? 

2.2. If, in a series of houses actually invaded by smallpox, 70 per cent, of the 
inhabitants are attacked and 85 per cent, have been vaccinated, wlmt is the 
lowest percentage of the vaccinated that must have been attacked? 

2.8. Given that 50 per cent, of the inmates of a workhouse are men, 60 per 
cent, are “aged” (over 60), 80 per cent, non-able-bodied, 85 per cent, aged 
men, 45 per cent, non-able-bodied men, and 42 per cent, non-abie-bodied and 
aged, find the greatest and least possible proportions of non-able-bodied aged 
men. 

2.4. (Material from ref. (69).) The following are the proportions per 10,000 
of boys observed for certain classes of defects amongst a number of school¬ 
children. A - development defects, B -nerve signs, 1) - mental dulness. 


N - 10,000 (D) —789 

(A) -~ 877 (AB) -888 

( B) --- 1,086 (BJJ) = 455 

Show that some dull boys do not exhibit development defects, mid state how 
many at least do not do so. 

2.5. The following' are the corresponding figures for girls: - 

N s.10,000 (/)) -- 689 

(A) * 682 (AB) -218 

(B) -* 850 (Bl)) - 868 


Show that some defectively developed girls are not dull, and state how many 
at least must be so. 

2.6. Take the syllogism All ^4's are IT s, all B' s arc C' s, therefore all .d’s are 
C's,” express the premises in terms of the notation of the preceding chapters, 
and deduce the conclusion by the use of the general conditions of consistence. 

2.7. Do the same for the syllogism “All A "s are 7J\s, no B'i s are C\s, therefore 
no A'i s are C’s.” 

2.8. Given that ( A)=r(B) ~(C) and that (AB)/N - (A(')IN ~-p, find 

what must be the greatest and least values of p in order that we may infer tliat 
(BC)/N exceeds any given value, say q. 

2.9. Show that if 


(A) 

N 


--x 




and 


(AB) (AC) (BC) 

’ N' ~ N N y 


the .value of neither x nor </ can exceed J. 

s/2.10. A market investigator returns the following data. Of 1000 people con¬ 
sulted, 811 liked chocolates, 752 liked toffee and 418 liked boiled sweets; 570 
liked chocolates and toffee, 856 liked chocolates and boiled sweets and 848 liked 
toffee and boiled sweets; 297 liked all three. Show that this information as it 
stands must be incorrect. 

2,11. (Imaginary data.) 50 per cent, of the imports of barley into a country 
come from the Dominions; 80 per cent, of the total imports go to brewing; 



CONSISTENCE OF DATA. 33 

75 per cent, of the imports are grown in the Northern hemisphere; 80 per cent, 
of Northern-grown barley goes to brewing; 100 per cent, of foreign Southern- 
grown barley goes to stock-feeding. Show that the foreign Northern-grown 
barley which goes to brewing cannot be less than 80 per cent, nor more than 
60 per cent, of the total imports. 

(It is assumed that brewing and stock-feeding are the only two uses to which 
imported barley is put.) 

2.12. A penny is tossed three times and the results, heads and tails, noted. 
The process is continued until there are 100 sets of threes. In 69 cases heads 
fell first, in 49 cases heads fell second, and in 58 cases heads fell third. In 83 cases 
heads fell both first and second, and in 21 cases heads fell both second and third. 
Show that there must have been at least 5 occasions on which heads fell 
three times, and that there could not have been more than 15 occasions on 
which tails fell three times, though there need not have been any. 


8 



CHAPTER 3. 

ASSOCIATION OF ATTRIBUTES. 


Independence. 

3.1. If there is no sort of relationship ol‘ any kind between two 
attributes A and B, wc expect to find the same proportion of ^’s amongst 
the B’s as amongst the not-B’s. We may anticipate, for instance, the 
same proportion of abnormally wet seasons in leap years as in oidmary 
years, the same proportion of male to total births when the moon is waxing 
as when it is waning, the same proportion of heads whether a coin be tossed 
with the right hand or the left. 

Two such unrelated attributes may be termed independent, and we 
have accordingly as the criterion of independence for A and IS : 

< AB) (Ap) ( 3 , 1 ) 

~(B) ~~(P) •' ‘ ' 

If this relation hold good, the corresponding relations 

(aB) Jap) 

'(B) (J 8 ) 

(AB) (aB) 

(A) (a) 

(Ap)Jap) 

(A) (a) 

must also hold. For it follows at once from (!i.l) that 

(B) - (JB) ^ (JS) - (Afi) 

(B) ' 03) 

that is, 

(aB) Jap) 

(B) (P) 

and the other two identities may be similarly deduced. 

The student may find it easier to grasp the nature of the relations stated 
if the frequencies are supposed grouped into a table with two rows and two 
columns, thus : 




ASSOCIATION OF ATTRIBUTES. 85 

Equation (8.1) states a certain equality for the columns ; if this holds 
good, the corresponding equation 

(AJi)JaB) 

(A) (a) 

must hold for the rows, and so on. 


Forms of the Criterion of Independence. 

3.2. The criterion may, however, be pul into a somewhat different 
and theoretically more convenient form. The equation (8.1) expresses 
(AB) in terms of ( B ), (/?) and a second-order frequency (A/3); eliminating 
this second-order frequency we have : 

(AB)JAB) + (Ap) (A) 

(B) (£)<-(£) N 


i.e. in words, “ the proportion of A’s amongst the B’s is the same as in the 
universe at large.” The student should learn to recognise this equation at 
sight in any of the forms : 


(AB) (A) 

(B) N 
(AB) (B) 

(A) ~ N 

(A)(B) 
N 


(AB)- 


(AB) 

N 


(A) 

N ' 


(«) 

N 


(a) 

(*>) 

(c) 

(d) 


. (3.2) 


Tlie equation ( d ) gives the important fundamental rule : If the attributes 
A and B are independent , the proportion of A B’i s in the universe is equal to 
the proportion of A 's multiplied by the proportion of B’ s. 

The advantage of the forms (3.2) over the form (3.1) is that they give 
expressions for the second-order frequency in terms of the frequencies of 
the first order and the whole number of observations alone ; the form (3.1) 
does not. 

" Example 3.1.— If there are 144 A’s and 381 B’i s in 1021 observations, 
how many A B’s will there be, A and B being independent? 


144 x 384 

1024 


-54 


There will therefore be 54 AB' s. 

y Example 3.2 .—If the A’s are 60 per cent., the B’s 35 per cent., of the 
whole number of observations, what must be the percentage of A B’s in 
order that we may conclude that A and B are independent ? 

60x85 _ 


100 



THEORY OF STATISTICS. 


86 


and therefore there must be 21 per cent, (more or less closely, cf. 3.8 and 3.9 
below) of AB\ in the universe to justify the conclusion that A and B are 
independent. 

3.3. It follows from 3.1 that if the relation (3.2) holds for any one of 
the four second-order frequencies, e.g. ( AB ), similar relations must hold 
for the remaining three. Thus we have directly from (3.1): 


giving 


(Ap) JAB)+ (Ap) JA) 

(P) (m + (P) * 




N 


and so on. This is seen at once to be true on consideration of the fourfold 
table on page 31. For if (AB) takes the value (. 1)(B)/N , (Afi) must take 
the value (A)(P)/N to keep the total of the row equal to (A), and so 
on for the other rows and columns. Tin* fourfold table in the ease of 
independence must m fact have the form: 


Attribute. 

Attribute. 

Total. 


P 

A 

(A)(B)!N 

(A m/N 

(A) 

a 



(«) 

Total 

(A) 

an 

N 


Example 3.3 .-—in Example 3.1 above, what would be the number of 
a/Ts, A and B being independent ? 


(a) 


08 ) 

(a/S) 


1021 - m 

—1024 — 381 
880 x 040 
1024 


880 

040 

550 


3.4. Finally, the criterion of independence may be expressed in yet a 
third form, viz. in terms of the second-order frequencies alone. If A and 
B are independent, it follows at once from the preceding section that 


(AB){al3) 


(A)(B)(a)(/J) 

N 2 


And evidently (a/i)( Jj3) is equal to the same fraction. 
Therefore 


(AB)(af3) 

= (a B)(Ap) 

(a) 

(A B) 

(A/3) 

(b) 

(all) 

(a/3) 

(AB) 

UP) 

(a«) 

( a fi) 

(c) 


• ( 3 . 3 ) 



ASSOCIATION OF ATTRIBUTES. 


87 

The equation (b) may he read: “ The ratio of A 's to a’s amongst the 
B’ s is equal to the ratio of A’s to a’s amongst the /3\” and (e) similarly. 
This form of criterion is a convenient one if all the four second-order 
frequencies are given, enabling one to recognise almost at a glance whether 
or not the two attributes arc independent. 

Example 3A.— If the second-order frequencies have the following values, 
are A and B independent or not ? 

(AB) = 110 (a/i) — 90 (A) 3)-290 (a£)-510 

Clearly 

(AB)(afi) > (aB)(Ap) 
so A and B arc not- independent. 


Association. 

3.5. Suppose now that A and B are not 
some wav or other, however complicated. 
Then'if 

(A)(B) 
N 


independent, but related in 


(AB) > 


A and B are said to be positively associated, or sometimes simply 
associated. If, on the other hand, 

MKX'-T 

A and B are said to be negatively associated or, more briefly, dis¬ 
associated. 

The student should carefully note that in statistics the word 
“ association ” has a technical meaning different from the one current in 
ordinary speech. In common language one speaks of A and B as being 
“ associated ” if they appear together in a number of eases. Hut in 
statistics A and B are associated only if they appear together in a greater 
number of eases than is to be expected if they are independent. Thus, 
if we consider means of land transport as dichotomised into road and rail 
travel, we may say, in the customary use of the lenn, that road transport 
is associated with speed. But it does not follow that the two are statisti¬ 
cally associated, because rail transport may equally be associated with 
speed and, in fact, the attribute speed may be independent of the means 
of travel in these two manners. 

Association, therefore, cannot be inferred from the mere fact that 
some A’s are Z?\s, however great the proportion ; this principle is funda¬ 
mental and should always be borne in mind. 


Complete Association and Disassociation. 

3.6. We have now to consider in what circumstances we may regard 
the association of two attributes as complete. Two courses are open to 
us. Either we may say that for complete association all A’s must he 

B’s and all B’s must be A’s, in which ease it must follow that the A’s 

and the B’s occur in the universe in equal numbers ; or we may adopt a 

rather wider meaning and say that all A’s are B’s or all B’s are A’s, 



m 


THEOKY OF STATISTICS. 


according to whether the A's or the ZTs are in the minority. Similarly, 
complete disassociation may be taken either as the case when no A's are 
J8’s and no a’s are jS’s, or more widely as the case when either of these 
statements is true. 

We shall adopt the wider definition in the sequel. Thus two attributes 
are completely associated if one of them cannot occur without the other, 
though the other may occur without the one. 

Measurement of Intensity of Association. 

3.7. It follows from the foregoing that if two attributes are com¬ 
pletely associated, (AB) must be equal to (A) or ( B) f whichever is the 
smaller. If they are completely disassociated, (AB) must be equal to 
zero or to (A) + (B) - N , whichever is the greater. (AB) must in general 
lie between these two limits. We may thus regard the divergence of 
(AB) from the “ independence ” value (A)(B)jN towards the limiting 
value in either direction as indicating the intensity of association or dis¬ 
association, so that wc may speak of attributes as being more or less, 
highly or slightly , associated. This conception of degrees of association 
quantitatively expressible is important, and we return in a later section 
to consider the formula* which may be used to measure such degrees. 

Sampling Fluctuations. 

'S.S. When the association is very slight, i.e. where (AB) only differs 
from (A)(B)/N by a few units or by a small proportion, it may be that 
such association is not really significant of any definite relationship. To 
give an illustration, suppose that a coin is tossed a number of times, and 
the tosses noted in pairs ; then 100 pairs may give such results as the 
following (taken from an actual record) : — 

First toss heads and second heads . . .26 

„ „ „ tails . . .18 

First, toss tails and second heads , . .27 

„ „ „ tails . . .29 

If we use A to denote “heads” in the first toss, B “heads” in 
the second, we have from the above (A)— 44, (B)—53. Hence 

(A)(B)/N ~ 23-32, while actually (AB) is 26. Hence there is a 

positive association, in the given record, between the result of the first 
throw and the result of the second. But it is fairly certain, from the 
nature of the ease, that such association cannot indicate any real connec¬ 
tion between the results of the two throws ; it must therefore be due 
merely to such a complex system of causes, impossible to analyse, as leads, 
for example, to differences between small samples drawn from the same 
material. The conclusion is confirmed by the fact that, of a number of 
such records, some give a positive association (like the above), but others 
a negative association. 

3.9. An event due, like the above occurrence of positive association, 
to an extremely complex system of causes of the general nature of which 
we are aware, but of the detailed operation of which we are ignorant, is 
sometimes said to be due to chance , or better to the chances or fluctua¬ 
tions of sampling. 



ASSOCIATION OF ATTRIBUTES. 89 


A little consideration wilt suggest that such associations due to the 
fluctuations of sampling must be met with in all classes of statistics. To 
quote, for instance, from 3.1, two illustrations there given of inde¬ 
pendent attributes, we know that in any actual record we would not be 
likely to find exactly the same proportion of abnormally wet seasons in 
leap* years as in ordinary years, nor exactly the same proportion of male 
births when the moon is waxing as when it is waning. But so long as the 
divergence from independence is not well marked we must regard such 
attributes as practically independent, or dependence as at least unproved. 

The discussion of the question, how great the divergence must be 
before we can consider it as “ well marked,” must be postponed to the 
chapters dealing with the theory of sampling. At present the attention 
of the student can only be directed to the existence of the difficulty, and 
to the serious risk of interpreting a “ chance association ” as physically 
significant. 


The Choice of a Suitable Form for Testing Association. 

3.10. The definition of 3.5 suggests that we are to test the existence 
or the intensity of association between two attributes by a comparison 
of the actual value of (A B) with its independence value (as it may be 
termed) (A)(B)/N. The procedure is from the theoretical standpoint 
perhaps the most natural, but it is more usual, and is simplest and best 
in practice, to compare proportions , e.g. the proportion of A\ amongst the 
B*8 with the proportion amongst the jS’s. Such proportions are usually 
expressed in the form of percentages or proportions per thousand. 

It will be evident from 3.1 and 3.2 that a large number of such com¬ 
parisons are available for the purpose, and the question arises, therefore, 
which is the best comparison to adopt ? 

3.11. Two principles should decide this point : (1) of any two com¬ 

parisons, that is the better which brings out the more clearly the degree 
of association ; (2) of any two comparisons, that is the better which 

illustrates the more important aspect of the problem under discussion. 

The first condition at once suggests that comparisons of the form 


(AB) (Ap) 

(B) '(j8) 


(3.4) 


are better than comparisons of the form 


(AB) ^ (A) 
(B) " N 


(3.5) 


For it is evident that if most of the objects or individuals in the universe 
are ZTs, i.e. if (B)/N approaches unity, (AB)j(B) will necessarily approach 
(A)/N even though the difference between (AB)j(B) and (^4/})/(/?) is 
considerable. The second form of comparison may therefore be mis¬ 
leading. 

Setting aside, then, comparisons of the general form (3.5), the question 
remains whether to apply the comparison of the form (3.4) to the rows or 
the columns of the table, if the data are tabulated as on page 34. This 
question must be decided with reference to the second principle, i.e. with 
regard to the more important aspect of the problem under discussion. 



40 


THEORY OF STATISTICS. 


the exact question to be answered, or the hypothesis to be tested, as illus¬ 
trated by the examples below. Where no definite question has to be 
answered or hypothesis tested both pairs of proportions may be tabulated, 
as ip Example 3.6. 

Example 3.5 .—Association between inoculation against cholera and 
exemption from attack. (Data from Greenwood and Yule, Table III, 
ref. (74).) 



Not attacked. 

Attacked. 

Total. 

Inoculated 

276 

3 

279 

Not inoculated . 

473 

06 

539 

Total . 

749 

69 

818 


Here the important question is, How far does inoculation protect from 
attack ? The most natural comparison is therefore — 

Percentage of inoculated who were not attacked . . 98-9 

„ not inoculated ,, „ . 87-8 

Or we might tabulate the complementary proportions 

Percentage of inoculated who were attacked . . 1*1 

„ not inoculated „ „ 12*2 

Either comparison brings out simply and clearly the fact that inocula¬ 
tion and exemption from attack are positively associated (inoculation and 
attack negatively associated). 

We are making above a comparison by rows in the notation of the table 
on page 34, comparing (AB)j(A) with (a/i)/(a), or (Af})/(A) with (a/3)/(a). 
A comparison by columns, r.g. (AB)j(B) with (Afi)l(fi), would serve 
equally to indicate whether there was any appreciable association, but 
would not answer directly the particular question we have in mind : 


Percentage of not-al tacked who were inoculated . . 30*8 

,, attacked ,, ,, 4*3 


'^Example -Deaf~mutism and Imbecility, 
of 1901. Summary Tables. ((VI. 1523).) 

Total population of England and Wales 
Number of the imbecile (or feeble-minded) 
Number of deaf-mutes . 

Number of imbecile deaf-mutes 


(Material from Census 


32,528,000 

48,882 

15,246 

451 


Required, to find whether deaf-mutism is associated with imbecility. 


We may denote the number of the imbecile by (/I), of deaf-mutes by 
(B). A comparison of (AB)/(B) with (A)/N or of (AB)j(A) with ( B)/N 
may very well be used in this case, seeing that ( A)jN and ( B)jN are both 
small. The question whether to give the preference to the lirst or the 
second comparison depends on the nature of the investigation we wish to 




ASSOCIATION OF ATTRIBUTES. 41 


make. If it is desired to exhibit the conditions among deaf-mutes the first 
may be used : 

p 3£-mb”<b>‘“ a "' one dc ‘ r :} 2M i*' «—* 

Proportion of imbeciles in the whole 1 , » 

population ~ (A)/N . j ” 


If, on the other hand, it is desired to exhibit the conditions amongst 
the imbecile, the second will be preferable: 


Proportion of deaf-mutes amongst) 
the imbecile = (AB)/(A) . . / 

Proportion of deaf-mutes in the) 
whole population-- ( B)/N ./ 


9*2 per thousand 


0-5 


99 


Either comparison exhibits very clearly that there exists an association 
between the attributes. It may be pointed out, however, that census data 
as to such infirmities are very untrustworthy. 

Example 3.7 .—Eye-colour of father and son (material due to Sir 
Francis Galton, as given by Professor Karl Pearson, Phil. Trans., A, vol. 
195,1900, p. 198 ; the classes 1, 2 and 3 of the memoir treated as “ light ”). 


Fathers wit h light eyes and sons 

with light eyes (AB) . 

471 

9f J' M 5 9 

not light ,, (Aj3) . 

151 

„ not light ,, ,, 

light „ (a/i) 

148 

If If f 91 

not light ,, (a/3) 

230 


Required to find whether the colour of the son’s eyes is associated with 
that of the father’s. In cases of this kind the father is reckoned once for 
each son ; e.g. a family in which the father was light-eyed, two sons light¬ 
eyed and one not, would be reekont d as giving two to the class AB and one 
to the class A\ 3. 

The best comparison here is — 

Percentage of light-eyed amongst the sonsl 
of light-eyed lathers . . . . J * 

Percentage of light-eyed amongst the sons) 

of not-light-eyed fathers . . . J ** 


But the following is equally valid : — 


Percentage of light-eyed amongst 
fathers of light-eyed sons 

Percentage of light-eyed amongst 
fathers of not-light-cyed sons 


7G per cent. 


the j to 


The reason why the former comparison is preferable is that we usually 
wish to estimate the character of offspring from that of the parents, and not 
vice versa. Both modes of statement, however, indicate equally clearly that 
there is considerable resemblance between father and son. 



42 THEORY OF STATISTICS. 

Example 3.8.— Association between inoculation against cholera and 
exemption from attack, five separate epidemics (cf. Example 0*5, data from 
Tables IX, X, XXVIII, XXIX, XXXI of ref. (74)). 




Not attacked. 

Attacked. 

Total. 

Inoculated . 


192 

4 

196 

Not inoculated 


113 

34 

147 

Total 


305 

38 

818 



Not attacked. 

Attacked. 

Total. 

Inoculated . 


5,751 

27 

5,778 

Not inoculated 


0,351 

198 

0,519 

Total 


12,102 

IS ! 

W I 

wl 1 

12,327 



Not attacked. 

Attacked. 

Total. 

Inoculated . 


4,087 

5 

4,092 

Not inoculated 


. 113,850 

1,141 

115,000 

Total . 


. 117,043 

1,149 

119,092 



Not attacked. 

Atta< ked. 

Total. 

Inoculated . 


8,332 

8 

8,310 

Not inoculated 


84,114 

550 

85,000 

Total 


92,770 

501 

93,310 



Not att ticked. 

Attacked. 

Total. 

Inoculated . 


4,870 

5 

4,875 

Not inoculated 


. 153,096 

901 

151,000 

Total 


. 157,900 

909 

158,875 


With' the table of Example 3.5 the nbo\e give data for six separate 
epidemics, in all of which the same method of inoculation appears to have 
been used : the data refer to natives only, and the numbers of observations 
are sutUeiently large to reduce “ fluctuations of sampling v within reason¬ 
ably narrow limits. The proportions not attacked are as follows :— 

Proportion not Attacked. 



Not Inoculated. 

Inoculated. 

Difference. 

1 . 

. 0*8770 

0*9892 

0*1116 

2 . 

. 0*7087 

0*9796 

0*2109 

3 . 

. 0*9098 

0*9953 

0*0255 

4 . 

. 0 9901 

0*9988 

0*0087 

5 . 

. 0*9935 

0*9990 

0*0055 

6 . 

. 0*9911 

0*9990 

0*0019 


In each case inoculation and exemption from attack are positively 
associated, but it will be seen that the several proportions, and the differ¬ 
ences between them, vary considerably. Evidently in a very mild 






ASSOCIATION OF ATTRIBUTES* 48 

epidemic this difference can only be small, and the question arises how 
far the data for the separate epidemics can be said to be consistent in 
their indication of the 44 efficiency ” of the inoculation. This is not a 
simple question to answer : the more advanced student is referred to the 
discussion in the original 

The Symbols (AB) 0 and 8. 

3.12. The values that the four second-order frequencies take in the 
case of independence, viz. 

(A)(B) (a)(B) (A)(§) (a)(P) 

N * N 9 N 9 N 

are of such great theoretical importance, and of so much use as reference- 
values for comparing with the actual values of the frequencies (AB)> (aB), 
(Aft) and (aft), that it is often desirable to employ single symbols to denote 
them. We shall use the symbols 

JA)(B) _(a)(p) 

{AB ) 0 = —(ap)o^ N 

i d\ (« )(B) R _(A)(p) 

(an) o 1 *" (Ap) 0 — 

If 8 denote the excess of (AB) over (AB) 0 < then, in order to keep the totals 
of rows and columns constant, the general table (cf. the table for the ease 
of independence on page 36) must be of the form 


Attribute. 

Attn bute. 

Total. 

B 

p 

A 

(AB) 0 + S 

(Ap)'-6 

(A) 

a 

(«»)„-' 5 

(«/)')„ H (5 

M 

Total 

w 

(/») 

N 


Therefore, quite generally we have: 

(AB) - (AB), - (aft) - (aj8) 0 = (Ap) 0 -(Ap) = (aB) 0 - (aB) = 8 

3.13. The value of this common difference 8 may lie expressed in a 
form that is useful to note. We have by definition : 

8 = (AB)-(AB) 0 --(AB)- ( '--]( B) 

Bring the terms on the right to a common denominator, and express all 
the frequencies of the numerator in terms of those of the second order } 
then we have: 

* 1 ( (AB)((AB) + (aB) + (Ap) + (aj8)]l 

N\-[(AB) + (Ap)][(AB)+(aB)] / 

~±{(AB)(aP)-(aB)(Ap)} 





44 


THEORY OF STATISTICS* 


That is to say, the common difference is equal to 1/jVth of the difference 
of the “ cross-products ” (AB)(afi) and (aB)(Af3). 

It is evident that the difference of the cross-products may be very 
large if N be large, although S is really very small. In using the difference 
of the cross-products to test mentally the sign of the association in a case 
where all the four second-order frequencies arc given, this should be 
remembered : the difference should be compared with N , or it will be 
liable to suggest a higher degree of association than actually exists. 

^Example 3.9 .—The following data were observed for hybrids of Datura 
(W. Bateson and Miss Saunders, Report to the Evolution Committee of 
the Royal Society, 1902) ; — 


Flowers violet, fruits prickly (AB) .47 

,, ,, smooth (A 1 8 ) . . 12 

Flowers white, „ prickly (a B) . . 21 

,, „ smooth (aj3) . . 3 


Investigate the association between colour of flower 


and character of 


fruit. 

Since 3 x47-141, 12 x21 -252, i.e. (AB)( a ft) (a B)(A/3), there is 
clearly a negative association; 252 -141 - 111, and at tirst sight this 
considerable difference is apt to suggest a considerable disassociating But 
8 — 111/83—1*3 only, and forms a small proportion of the frequency, so 
that in point of fact the disassoeiation is small, so small that no stress can 
be laid on it as indicating anything but a fluctuation of sampling. Work¬ 
ing out the percentages we have: 


Percentage of violet-flowered plants with) wn ( . 

prickly fruits . . . . . J J 

Percentage of white-flowered plants with I 
prickly fruits.J ' 


Coefficient of Association. 

3.14. In the previous examples we have judged the association by 
comparing the class-frequencies with those which would exist if the data 
were given by independent attribute's, and we can form a rough idea of 
the strength of the association by examining the extent of the difference. 
This is sufficient for almost all practical purposes, although, if the data 
are likely to be affected seriously by iluetuations of random sampling, 
some test of the significance of the difference is also necessary. Apart 
from this question, however, it is sometimes convenient to measure the 
intensities of the associations by means of a coefficient. 

It is clearly convenient if such a coefficient can be devised as to be 
zero if the attributes are independent, +1 if they are completely associated 
and -1 if they are completely disassociated. 

3.15. Many such coefficients may be devised, but perhaps the simplest 
possible (though not necessarily the most advantageous) is the expression -- 

(AB)(ap)-(Apl(aB) 

* (ABj(uf3) + (Aft) (a B) 

NS 

~(AB)(aP) + (AfSj(aB) 




ASSOCIATION OF ATTRIBUTES. 


45 

where 8 is the symbol used in 3.12 and 3.13 for the difference 
(AB) - (AB) 0 . It is evident that Q is zero when the attributes are 
independent, for then 8 is zero: it takes the value +1 when there is 
complete association, for then the second term in both numerator and 
denominator of the first form of the expression is zero : similarly it is 
-1 where there is complete disassoeiation, for then the first term in both 
numerator and denominator is zero. Q may accordingly be termed a 
coefficient of association. As illustrations of the values it will take 
in certain eases, the association between deaf-mutism and imbecility, on 
the basis of the English census figures (Example 3.6), is +0*91 ; between 
light eye-colour in father and in son (Example 3.7), -f0*66; between 
colour of flower and prickliness of fruit in Datura (Example 3.9), -0*28—a 
disassoeiation which, however, as already stated, is probably of no practical 
significance and due to mere fluctuations of sampling. 

The student should note that if all the terms containing A are multiplied 
by a constant, the value of Q is unaltered. Similarly for a, B and j8. 
Hence Q is independent of the relative proportions of A’s and a’s in the 
data. This property is important, and renders such a measure of associa¬ 
tion specially adapted to eases in which the proportions are arbitrary 
(c.g. experiments). A form possessing the same property but certain 
marked advantages over Q is suggested in ref. (80). 

3.16. The coefficient is only mentioned here to direct the attention 
of the student to the possibility of forming such a measure of association, 
a measure which serves a similar purpose in the ease of attributes to that 
served by certain other coefficients in the eases of manifold classification 
(cf. Chap. 5) and of variable's (cf. Chap. 11, and the references to Chaps. 11, 
12 and 13). For further illustrations of the use* of this coefficient the 
reader is referred to ref. (78); for a modified form of the coefficient, 
possessing the same properties but certain advantages, to ref. (80) ; 
and for a mode of deducing another coefficient, based on theorems in the 
theory of variables, which has come into more general use, though in 
the opinion of the present writers its use is of doubtful advantage, to 
ref. (76). Reference should also be made to the coefficient described in 
13.25. The question of the best coefficient to use as a measure of associa¬ 
tion is one on which statisticians differ: for a discussion the student is 
referred to refs. (71), (77) and (80). 

vA Necessary Caution. 

3.17. In concluding this chapter, it may be well to repeat, for the 
sake of emphasis, that the mere fact of 80, 90 or 99 per cent, of A ’s being 
B’s implies nothing as to the association of A with B; in the absence of 
information, we can but assume that 80, 90 or 99 per cent, of a’s may also 
be B’s. In order to apply the criterion of independence for two attributes 
A and 2/, it is necessary to have information concerning a’s and /?’s as well 
as A’s and B’s, or concerning a universe that includes both a’s and A’s, 
jS’s and B’s. Hence an investigation as to the causal relations of an 
attribute A must not be confined to A’s, but must be extended to a’s 
(unless, of course, the necessary information as to a’s is already obtainable): 
no comparison is otherwise possible. It would be no use to obtain with 
great pains the result (cf. Example 3.0) that 29*6 per thousand of deaf- 
mutes were imbecile unless we knew that the proportion of imbeciles in the 



THEOEY OF STATISTICS. 


whole population was only 1*5 per thousand ; nor would it contribute 
anything to our knowledge of the heredity of deaf-mutism to find out the 
proportion of deaf-mutes amongst the offspring of deaf-mutes unless the 
proportions amongst the offspring of normal individuals were also in¬ 
vestigated or known. 


SUMMARY. 


1. Two attributes are independent if the proportion of A\ among the 
B *s is the same as the proportion among the not-ZTs. 

2. This definition can be expressed symbolically in numerous forms, in 
terms of either first-order or second-order frequencies. The form in which 
the data are given, and the question which is to be answered, determine 
which form is to be employed in any particular case. 

8. Attributes which are not independent arc said to be positively 
associated if 


and negatively associated if 


(AB) > 


M)(») 

N 


(AB) < 


(A)(B) 

N 


4. The statistical meaning of the word “ association ” is different from 
the meaning ascribed to it in ordinary language. 

5. Before association may be said to indicate a definite relation 
between the attributes, it is necessary to be satisfied that the divergence 
from independence is not due to fluctuations of sampling. 

6. The divergence of the actual frequency from the “ independence ” 
frequency is denoted by the symbol 5, and hence 


S = (AB) 


(A)(B) 

N 


7. The coefficient of association is defined by 


* (A B)(aP) +{A P)(aB) 

It is zero if the attributes are independent, -f 1 if they are completely 
associated and - 1 if they arc completely disassociated. There are, 
however, other forms of coefficient more advantageous in certain eases 
(ref. (80)). 


EXERCISES. 

8.1. At the census of England and Wales in 1901 there were (to the nearest 
1000) 15,729,000 males and 10,799,000 females; 3497 males were returned as 
deaf-mutes from childhood, and 3072 females. 

State proportions exhibiting the association between deaf-mutism from child¬ 
hood and sex. How many of each sex for the same total number would have 
been deaf-mutes if there had been no association ? 

3.2, Show, as briefly as possible, whether A and B are independent, positively 
associated or negatively associated in each of the following cases:— 



47 


ASSOCIATION OF ATTRIBUTES* 


(a) 

N^mm 

(A) =2850 

(B) -=8100 

(AB) =1000 

(6) 

(A)- 490 

294 

(a) ■ 570 

(aB) == 380 

(c) 

(AB)^ 250 

(aB) ~ 708 

(A/S) 48 

(ajS) 144 


3.8. (Figures derived from Darwin’s '‘Cross- and Self-fertilisation of Plants*”) 
The table below gives the numbers of plants of certain species that were above or 
below the average height, stating separately those that were derived from cross- 
fertilised and from self-fertilised parentage. Investigate the association between 
height and cross-fertilisation of parentage, and draw attention to any special 
points you notice. 


Species. 

Parentage Cross-fer¬ 
tilised. Height - 

Parentage Self-for- 
tihsod Height.— 


Above 

Average*. 

Below 

Average. 

Above 

A vertigo. 

- 

Below 

Average. 

Ipotnjoa purpurea . 

03 

10 

IK 

65 

petunia violamv .... 

01 

16 

IS 

f>4 

Tlnscda luloa .... 

25 

7 

11 

21 

lleaedu odorata .... 

39 

10 

26 

30 

Lobelia fulgerm 

J7 

17 

12 

22 


3.4. (Figures from same source as Example* 3.7, p. 41, but material differently 
grouped; classes 7 and 8 of the memoir treated as “dark.”) Investigate the 
association between darkness of eye-colour in father and son from the following 
data:— 


Fathers with dark eyes and sons with dark eyes 

(AB) . 

50 

,, „ „ not-dark eyes 

Fathers with not-dark eyes and sons with dark eyes 

(Afi) . 

79 

(<Ai) . 

89 

,, ,, ,, not-dark eyes 

(a/)) . 

782 


Also tabulate for comparison the frequencies that would have been observed 
had there been no heredity, i.e. the values of (AB) 0 , (A/3) 0 , etc. 

3.5. (Figures from same source as above.) Investigate the association between 
eye-colour of husband and eye-colour of wife (“assortative mating”) from the 
data given below. 

Husbands with light eyes and wives with light eyes (AB) . 309 

„ „ ,, not-light eyes (A/I) . 214 

Husbands with not-light eyes and wives with light eyes (aB) . 332 
,, ,, ,, not-light eyes (aft) . 119 

Also tabulate for comparison the frequencies that would have been observed 
had there been strict, independence between eye-colour of husband and eye- 
eolour of wife, i.e. the values of (AB) i}s etc., as in Exercise 3.4. 

3.6, (Figures from the Census of England and I Vales, 1891, vol. 3: the data 
cannot be regarded as trustworthy.) The figures given below show the number 
of males in successive age-groups, together with the number of the blind ( A ), of 
the mentally deranged ( B ) and the blind mentally deranged (AB). Trace the 
association between blindness and mental derangement from childhood to old 
age, tabulating the proportions of insane amongst the whole population and 
amongst the blind, and also the association coefficient Q of 3.15. Give a short 
verbal statement of your results. 






48 


THEORY OF STATISTICS 


3.7. Show that if 

(■ AB ), (oB), (/f«, ( 0^)1 

M*)* («!#), (Afl), (aph 

be two aggregates corresponding to the same values of (A), (ft), (a) and (p), 
(Aft), - (Aft)t—(aB)t - (aft), =.(Ap), - (Ap), = (ap), - (aft), 

' 3.8. Show that if 

d=(AB)-(AB) 0 

(Any h (apy -(afty - < Apy - |(.i) -<«)H(B)-(/?)| j a.va 

I 3.9. The existence of association may be tested either by comparison of 
proportions (e.g. (Aft)l(ft) with (AjS)j(p)), as in 3.10 and 3.11, or by the value 
of 6, as in 3.12 and 3.13. Show that 

(B)(p)((AU) (Ap)\ 

JV \ (ft) (P) / 

_(A)(a)((AB) («B)1 
- (IV) 1 (A) - («) / 

3.10, Spence and Charles, in An Investigation into the Health and Nutrition of 
Certain of the Children of Nrwcastlc-on - Tyne between the Ages of One and Five 
Years (City and Council of Newcastle-on-Tyne, February 1034), compared two 
groups of children, one belonging to the professional classes, 125 in number, 
and the other belonging to the labouring classes, 121 in number. They found 


the following results:— 

Poor Well-to-do 

Children. Children. 

Per cent. Per cent. 

Below normal weight ... 55 13 

Above normal weight ... 11 48 


Find the coefficient of association between the weight of the children and their 
social status. 

3,11. (Data from the Report on the Spahlingcr Experiments in Northern 
Ireland , 1931-1934, H.M. Stationery Office, 1935.) In experiments on the 
immunisation of cattle from tuberculosis the following results wort* secured:— 



Cattle. 



Pied of 

TuberculosiH or 
very seriously 
affected. 

Unaffected or 
only slightly 
affected. 

Total. 

Inoculated with vaccine 

6 

13 

19 

Not inoculated or inoculated with 

8 

3 

11 

control media. 


i 


Total 

14 

16 ! 

30 

„ _ _ 

— 




(The cattle were first inoculated with protective vaccine and then deliberately 
infected with serious quantities of tubercle germs.) 

Find the coefficient of association between inoculation and exemption from 
serious tuberculosis. 




ASSOCIATION OF ATTRIBUTES. 


y 


/ ‘ 


49 


Vfj.lS. Criticise the following argument; “Nearly all the A ’s are JB’s, and 
therefore A and B must be associated,” and state what suppressed premises 
would justify it in the following cases:— 


“99 per cent, of the people who drink beer die before reaching 100 years of 
age. Therefore drinking beer is bad for longevity.” 

“99 per cent, of the members who voted for the Army Estimates were 


military officers. Therefore it was unfair to suppose that the voting was 


unbiassed.” 


“In every country where the sale of contraceptives is tolerated by the 
Government the birth-rate is declining. Therefore contraception must exert 
an influence on the birth-rate.” 


* 3.13. Write down in the form of the table of 3.1 the frequency groups when 
(1) all A’s arc B's; (2) all B’s are A 9 s; (3) all A’s are ITs and all B*s are A’s; 
and the three similar tables when A and B are completely disassociated. 


4 



CHAPTER 4. 


PARTIAL ASSOCIATION. 

Association in Sub-universes. 

4.1. In the last chapter we considered the association of two attri¬ 
butes in a universe without regard to whether any information existed 
about other attributes in the universe. If, however, such information 
does exist and, say, we can find the frequency-classes of attributes C, D, 
etc,, the question arises. What are the associations of A and B in the 
sub-universes C, y, CD, etc. ? 

Thus, if .4 » standard of health and B consumption of food, the dis¬ 
cussion of the previous chapter would enable us to examine whether health 
and food consumption were associated in any particular universe, say the* 
population of Great Britain. But we might want to go further than this 
and examine the association between A and B among males, or among the 
poorer classes, and compare it with the association among females or among 
the well-to-do classes, respectively. Defining <7 - males and D - poor, this 
amounts to examining the associations of A and B in the universes C, y , 
D and 8. 

4.2. Associations of this kind are of the utmost importance in 
statistical practice. As instances of the ways m which they arise let us 
consider the following two illustrations :— 

(I) Suppose that wc have established, in the manner of the previous 
chapter, a positive association between inoculation and exemption from 
smallpox in a universe of persons. It is natural to infer that this associa¬ 
tion is due to some causal relation between the two attributes and may be 
expected to recur in the future ; in short, that smallpox is prevented by 
vaccination. 

This rather hasty conclusion might, howe\er, meet an opponent who 
argues in this way : vaccination is accepted among the well-to-do classes, 
but is looked on with suspicion by the lower classes. For this and other 
reasons most of the unvaecinated persons are drawn from the lower classes. 
But these are precisely the people whom, from the unhygienic conditions 
under which they live, one would expect to be exposed to infection and 
who, moreover, being malnourished, would be more likely to contract 
disease when they were infected. Hence the comparative exemption of 
the vaccinated persons is not due to the fact that they have been vaccinated, 
but to the fact that they belong to the w r ell-to-do classes. It is, as it were, 
an accident that these people also happen to be from a class which favours 
vaccination. 

Denoting vaccination by A, exemption from attack by B and hygienic 
conditions by C 9 this argument amounts to saying that the observed 

50 



PARTIAL ASSOCIATION. 51 

association between A and B is not of itself causally direct, but is due to 
the associations of both A and B with C. 

Now it is clear that this objection could not be lodged if the hygienic 
conditions among all the members of the universe were the same. If, 
therefore, wc examine the association of A and B in the sub-universe C 
and still find an association, the supposed argument would be refuted. We 
are thus led to a consideration of the association in that sub-universe. 

(2) As a second example, suppose that an association is noted between 
the presence of an attribute in the father and t he presence in the son, and 
also between the presence in the grandfather and the presence in the grand¬ 
son. The question which arises here is: Does the resemblance between 
grandfather and grandson arise from a kind of hereditary transmission 
which may, in the common phrase, “skip a generation,” or is it merely 
due to the fact that the grandfather is like the father and the father is like 
the son ? 

Denoting the presence of the attribute in the son, father and grand¬ 
father by A , B and C , the question is : Is the association between A and G 
due to associations between A and B, and B and C ? 

If the association between A and C is observed among all the cases in 
which the lather possesses the attribute or all those in which he does not, 
and is still sensible, clearly the association between A and C cannot be due 
to associations between A and B f B and C ; hence, as before to resolve 
the question we are led to consider the association between A and C in the 
sub-universes B and / 3 . 

4.3. Generally, ambiguity of the type to which we have just referred 
arises from the fact that the universe of discussion contains not merely 
objects possessing the third attribute alone, but a mixture of objects with 
and without it. To meet the requirements of the discussion we have to 
consider the associations in sub-universes wherein this attribute is entirely 
absent or entirely present. By this means we can go deeper into the nature 
of the underlying causes and eliminate certain possible explanations of the 
type : an association between A and B does not mean that the two are 
directly related, but only that each is associated with a third attribute C. 

Partial Associations. 

4.4. The associations between A and B in sub-universes are called 
partial associations, to distinguish them from the total associations 
between A and B in the universe at large. 

As for total association, A and B are said to be positively associated 
in the universe of Cs if 

(ABC)> ( — C XyQ .... (4.1) 

and negatively associated in the converse case. 

Similarly they are positively associated in the universe of CD's if 


. . . «.) 

and so on. These formulae are derived from the formula for total associa¬ 
tion by specifying the universe in which the partial association exists. 



52 


THEORY OF STATISTICS. 


Alternative Forms of the Conditions for Partial Association. 

4.5. As in the rase of total association, the above forms can be 
written in many ways, adapted to the nature of the data and of the question 
which is to be answered. The partial association is most conveniently 
tested by comparisons of percentages or proportions in the manner of the 
previous chapter, and we may quote t lie four most convenient comparisons 
in the ease of three attributes : 


{ABC) ^ (AC) 
(BC) ""(C) 

(ABC) ^ (AfiC) 
(JiC) " (pc) 


(«) 

(e) 


(ABC) ^ (BC) 
(AC) " (C) 

(ABC) (aBC) 
(AC) '' ( aC) 


(b)) 

(d) 


(4.3) 


Similar formula' may be written down for the cases of four or more 
attributes, and the methods of this chapter are applicable to such cases. 
For the sake of simplicity wc shall, however, confine ourselves to three 
attributes hereafter. 

4.6. Let us now consider some examples. 

Example 4.1 .—(Material from ref. (00).) 

The following are the proportions per ] 0.000 of boys observed with 
certain classes of defects amongst a number of school-children. (A) 
denotes the number with development defects, (B) the number with 
nerve signs, (D) the number of the “ dull.” 


N 

](),()()() 

(AB) 

888 

(A) 

877 

(AD) 

888 

(11) 

1,080 

(11D) 

455 

(U) 

789 

(AB1)) 

158 


The Report from which tin* figure's are drawn concludes that “the connect¬ 
ing link between defects of body and mental dulness is the coincident 
defect of brain which may be known by observation of abnormal nerve 
signs.” Discuss this conclusion. 

The phrase “ connecting link ” is a little vague, but it may mean that 
the mental defects indicated by nerve signs B may give rise to develop¬ 
ment defects A , and also to mental dulness I) ; A and D being thus 
common effects of tin* same cause B (or another attribute necessarily 
indicated by B) and not directly influencing each other. The case is 
thus similar to that of the first illustration of 4.2 (liability to smallpox 
and to non-vaccination being held to be common effects of the same 
circumstances), and may be similarly treated by investigation of the 
partial associations between A and 1) for the universes B and )3. As the 
ratios ( A)jN , ( B)jN , (I))/N arc small, comparisons of the form (8.5), 
page 89, or (4.8) (a) and (b) above, may be used. 

The following figures illustrate, then, the association between A and D 
for the whole universe, the B-universe and the /3-universc :— 

For the entire material : 

789 

Proportion of the dull ~(/J )IN . . . -- --- - 7*9 per cent. 

10,000 

„ „ defectively developed who \ 888 __ 

were dull = (AD) 1(A) . . . . f = 877 = 88S ” 



PARTIAL ASSOCIATION. 


53 


For those exhibiting nerve signs : 
Proportion of the dull ~(Bl))j(B) 


455 
1,086 


- 41 0 per cent. 


„ „ defectively developed who 1 __ 158 

were dull ==(AB1))I(AB) . . .j " MH = 


For those not exhibiting nerve signs : 

Proportion of the dull =(/!/))/(/?) . . ~ =■ 3-7 

Oj!) I i 

„ ,, defectively developed who \ _ 185 _ 

were dull . . . . f 580 


The results are extremely striking ; the association between A and D 
is high both for the material as a whole (the universe at large) and for 
those not exhibiting nerve signs (the /3-universe), but it is small for those 
who do exhibit nerve signs (the 7i-univer.se). 

This result does not appear to be in aecord with the eonelusion of the 
Report , as we have interpreted it, for the association between A and D 
in the /3-universe should in that case have been low instead of high. 

Example 4.2. —Kye-colour of grandparent, parent and child. (Material 
from Sir Francis Galton’s “ Natural Inheritance ” (1880), Table 20, p. 216. 
The table only gives particulars for 78 large families with not less than 
six brothers or sisters, so that the material is hardly entirely representative, 
but serves as a good illustration of the method.) The original data are 
treated as in Example 8.7, page II. Denoting a light-eyed child by A, 
parent by 77, grandparent bv every possible line of descent is taken into 
account. Thus, taking the following two lines of the table, 


Children. 

A. a. 

Not- 


Light-eyed. 


Light-eved. 


Parents. 

n. p. 

ill i Not- 


Grandparents. 

('■ r • 

LlglU-eyed. . . , 

^ J Light-eyed. 


4 5 1113 

3 4 114 0 


the first would give 4x1 x 1 I to the class ABC, 4 xl x3~12 to the 
class A By, i to AfiC, 12 to A fjy, 5 to aBC, 15 to a By. 5 to afiC and 

« Rh* - iKn c „ w. n , l *> wl w A I ♦ Iw. IO 4-^ 


(ABC) 

(ABy) 

(ApC) 

(AM 


the whole 

table are: 


1928 

(aBC) 

303 

596 

(a By) 

225 

552 

(afS(') 

395 

508 

(a fly) 

501 


The following comparisons indicate the association between grand¬ 
parents and parents, parents and children, and grandparents and grand¬ 
children, respectively:— 



54 


THEORY OF STATISTICS. 


Grandparents and Parents , 


Proportion of light-eyed amongst the children) ^_ (BC )_2 28I 
of light-eyed grandparents . . . J “ (C) 8178 

Proportion of light-eyed amongst the children^ _ (By) __ 821 
of not-light-eyed grandparents . . . j (yj 1830 


--70-2 per cent. 
=44-9 „ 


Parents and Children. 


Proportion of light-eyed amongst the children\ _ MB) 2524 

of light-eyed parents . . . .((/?) 3052 * 1 

Proportion of light-eyed amongst the c hildren) _ (Aft) __ 1060 „ 54.0 
of not-light-eyed parents . . . .J (fl) 1956 ” 


In both the above cases we are really dealing with the association 
between parent and offspring, and consequently the intensity of association 
is, as might be expected, approximately the same; in the next case it is 
naturally lower : 

Grandparents and Grandchildren . 

Proportion of light-eyed amongst the grand-) MC) 2480 _ 
children of light-eyed grandparents . ./ (C) 8178 p 

Proportion of light-eyed amongst the grand-) _ (A y) 1104 

children of not-light-eyed grandparents • J ~~ (y) ~188() ” 

We proceed now to test the partial associations between grandparents 
and grandchildren, as distinct from the total associations given above, in 
order to throw light on the real nature of the resemblance. There are 
two such partial associations to be tested: (1) where the parents are 
light-eyed, (2) where they are not-light-eyed. The following are the 
comparisons :— 


Grandparents and Grandchildren : Parents light-eyed. 


Proportion of light-eyed amongst the grand-) 
children of light-eyed grandparents. . | 


(ABC) 
( BC) 

Proportion of light-eyed amongst the grand-) _(ABy) 
children of not-light-eyed grandparents . j ' (By) 


1928 , 

223l" 86 4 PCr Cent * 


596 

821 


= 72*6 


Grandparents and Grandchildren : Parents not-light-eyed. 

Proportion of light-eyed amongst the grand-\ _ (AfiC) _ 552 _ 

children of light-eyed grandparents. .J ~ (ftC) * 947 I )0T eent * 

Proportion of light-eyed amongst the grand-) _(Afiy) ^508 

children of not-light-eyed grandparents f ~ (fly) ~1(X)9~ &1,5 ** 

In both cases the partial association is quite well marked and positive ; 
the total association between grandparents and grandchildren cannot, 
then, be due wholly to the total associations between grandparents and 
parents, parents and children, respectively. There is an ancestral heredity , 
as it is termed, as well as a parental heredity. 

We need not discuss the partial association between children and 

E arents, as it is comparatively of little consequence. It may be noted, 
owever, as regards the above results, that the most important feature 
may be brought out by stating three ratios only. 



PARTIAL ASSOCIATION. 


55 


If A and B are positively associated, {AB)j(B) > (A)/N> 

If A and C are positively associated in the universe of B' s, (ABC)j(BC) 
> (AB)I(B). Hence ( A)/N , ( AB)/(B) and (ABC)j(BC) form an 
ascending series. Thus wc have from the given data: 

Proportion of light-eyed amongst children in' 
general . 

Proportion of light-eyed amongst the children 
of ligh f -eyed parents 

Proportion of light-eyed amongst the children 
of light-eyed parents and grandparents 

If the great-grandparents, etc., etc., were also known the series might 
be continued, giving ( ABCD)/(BCJ )), (ABCDE)j(BHDE) and so forth. 
The series would probably ascend continuously though with smaller 
intervals, A and D being positively associated in the universe of BCs> 
A and E in the universe of BCD' s, etc. 


= ( A)}N =71 *6 per cent. 

- (AB)I(B) =82*7 „ 

^(ABC)((BC)=MA 


Notation for Partial Associations. 


4.7. Wc now introduce a notation which is analogous to that used 
for total associations. II will be remembered that in the last chapter 
we wrote: 


(-4«)o 


(A)(B) 

N 


Wc now write: 


3 -(AB)-(AB) 0 


(A,. c V <™, UiB.cn). 
8 A s.r^(ABC)~(AB.C) 09 8 a b c (A BCD) ~ (AB. CD) 0 , etc. 


(4.4) 


The 8-numbers measure the divergence of the actual frequencies from 
those which would exist if the attributes were independent in the sub¬ 
universe under discussion. 

It is also possible to generalise the coefficient of association ^ by 
defining partial coefficients of the type 

(ABC)(aPC) -(ApC)(aBC) 
tAB <% {ABC)(apC) 

(C )8 \b r 

(A~BC)(al3C) + (ApC)(aBC) 

The student will notice that the formulae for the 8-numbers and for 
the Q numbers are obtained from the expressions for total association by 
specifying the universe in which the partial association is to be considered. 
They need not therefore be memorised. 

Number of Partial Associations. 

4.8. For three attributes A , B, C there arc three total associations, 
namely, those of A with B , B with C and C with A; and six partial 
associations, namely, those of A and B in C and y, B and C in A and a, 
and C and A in B and /3. 




56 


THEORY OF STATISTICS, 


For four attributes there are fifty-four associations ; for we can choose 
two attributes from four in six ways, and there are nine associations for 
each pair (one total, four partials in the sub-universes specified by one 
attribute, and four partials in the sub-universes specified by two). 

We state without proof that for n attributes there are 3 n ~ 2 

2 


associations. 


n(n -1) 


- of these are total and the remainder partial. 


For 


n > 4 this number is so large as to be almost unmanageable. For instance, 
if n = 5 it is 270, and if n -G it is 1215. 

4.9, The large number of partial associations which exists might be 
thought to occasion some difficulty. We may, however, reassure ourselves 
by two considerations. 

In the first place, it is rarely necessary to investigate in any practical 
instance all the partial associations ■which arc theoretically possible. For 
instance, m Example 4.1 the total and partial associations between A 
and 1) were alone investigated ; those between A and 71, B and 1) were 
not essential for answering the question which was asked. Again, in 
Example 4.2 the three total associations and the partial associations 
between A and C were all that were necessary. 


Relations between Partial Associations. 

4.10. In the second place, a theoretical discussion of the partial 

associations h assisted by the following result : The ‘ - asso(Ma . 

tions are all expressible in terms of 2 W - (w -f 1) algebraically independent 
associations, together with the class-frequencies A r , (A ), (Vi). (T), etc. 

In fact, we saw in Chapter 1 that all the class-frequencies can be 
expressed in terms of the positive class-frequencies, wluch are 2 W in 
number in the case of n attributes. Hence the frequencies N, (A), ( B ), 
{€), etc., of which there are (n \ 1), together with the 2" - (n +1) other 
positive frequencies, completely determine the data, and lienee determine 
the associations, which are expressed m terms of the data. Hence the 
number of algebraically independent associations which can be derived 
is only 2 W - (n + 1). 

4.11. In practice the existence of these relations is of little or no value 
The formal relations between the ratios and the 8-numbcrs which express 
the associations are, in fact, so complex that lengthy algebraic manipula¬ 
tion is ntccssaiy to express those which are not known in terms of those 
which are It is usually better to evaluate the class-frequencies and 
calculate the desired results directly from them. 

4.12. There is, however, one result which has important theoretical 
consequences. 

We have, by definition, 


(y) 



PARTIAL ASSOCIATION. 


57 


Henne, 

- (AB) 


■ {C l ){ ^{{AC){1iC)(y) + (Ay)(By)(C)} 


(AB) - - t ~{N(AC)(BC) - ( A)(C)(BC) - ( B)(C)(AC) 

+ (A)(B)(C)) 

■«*>■- ( T } -«l,h'> -'T’iH-'T 1 } 


N 


( c )(y) 


(4.6) 


This gives us the sum of the 8-numbers for the partial associations of A 
and B in C and y in terms of the total associations between A, B and C, 
Now suppose that A and B are independent in C and y. Then we 
have : 

8 . 4 // /; — 8 \/i y ^=0 

and 


$ jb — v8.i('8//e 

(<)(y) 


8/iz? is not zero unless one or both of Sac, 8 w are zero. 

Hence, if A and B are inde])endent within the universes of C’s and 
not-C’s, they will nevertheless be associated in the universe at large unless 
C is independent of A or B or both. 


Illusory Associations. 

4.13. This peculiar result indicates that, although a set of attributes 
independent of A and B will not affect the association between them, the 
existence of an attribute C with which they are both associated may give 
an association in the universe at large which is illusory in the sense that 
it does not correspond to any real relationship between them. If the 
associations between A and C\ B and C are of the same sign, the resulting 
association between A and B will be positive ; if of opposite signs, negative. 

The cases which we discussed at the beginning of this chapter are 
instances in point. In the first illustration we saw that it was possible to 
argue that the positive associations between vaccination and hygienic con¬ 
ditions, exemption from attack and hygienic conditions , led to an illusory 
association between vaccination and exemption from attack. Similarly, the 
question was raised whether the positive association between grandfather 
and grandchild may not be due to the positive associations between grand¬ 
father and father , and father and child . 

4.14. Misleading associations may easily arise through the mingling 
of records which a careful worker w r ould keep distinct. 

Take the following ease, for example. Suppose there have been 
200 patients in a hospital, 100 males and 100 females, suffering from some 
disease. Suppose, further, that the death-rate for males (the ease mor¬ 
tality) has been 80 per cent.., for females 60 per cent. A new treatment is 
tried on 80 per cent, of the males and 40 per cent, of the females, and the 
results published without distinction of sex. The three attributes, with 



58 THEORY OF STATISTICS. 

the relations of which wc are here concerned, are death, treatment and male 
sex. The data show that more males were treated than females, and more 
females died than males; therefore the first attribute is associated nega¬ 
tively, the second positively, wjlh the third. It follows that there will be 
an illusory negative association between the first two— death and treatment. 
If the treatment were completely inefficient wc should, in fact, have the 
following results :— 

Males. Females. Total. 

Treated and died . . . 21 24 48 

,, and did not die .56 16 72 

Not treated and died . . 6 80 42 

„ and did not die .11 24 88 

i.c. of the treated, only 1*8/120 10 per cent, died, while of those not 

treated 42/80 -52-5 per cent. died. If tins result were stated without any 
reference to the fact of the mixture of the sexes, to the different proportions 
of the two that were treated and to the different death-rates under normal 
treatment, then some value m the new treatment would appear to be 
suggested. To make a fan return, either the results for the two sexes 
should be stated separately, or the same proportion of the two sexes must 
receive the experimental treatment. Further, care would have to be taken 
in such a ease to see that there was no selection (perhaps unconscious) of 
the less severe eases for treatment, thus introducing another source of 
fallacy (death positively associated with severity , treatment negatively 
associated with severity, gmng rise to illusory negative association between 
treatment and death). 

4.15. Illusory associations may also arise in a different way through 
the personality of the ibsemr or observers. It the observer's attention 
fluctuates, he may be more likely to notice tin* presence of A when he 
notices the presence of B , and vice vensa ; m sueh a case A and B (so far as 
the record goes) will both he associated with the observer's attention (\ 
and consequently an illusory association will be created. Again, if the 
attributes are not well defined, one observer may be more* generous than 
another m deciding when to record the piesonce of A and also the presence 
of B f and even one observer may fluctuate in the generosity of his marking. 
In this ease the recording of A and the recording of B will both be associated 
with the generosity of the observer in recording their presence, C\ and an 
illusory association between A and B will com ' v quently arise, as before. 

Determination of Sign of Association when the Data are Incomplete. 

4.16. It is important to notice that, though we cannot actually 
determine the partial associations unless the third-order frequency (ABC) 
is given, we can make some eonjedme as to their signs from the values of 
the second-order frequencies. 

In 4.12 wc have: 

* , s , .... (AC)(BC) ( Ay)(By) 

o,\n < + o i/( , - {AB) - ^ (y) - • • (4.7) 

Hence, if the expression on the right is positive, one at least of 8 m c , 
S.ib 7 , is positive, i.e. A and B are positively associated either in C or y 
or both. Similarly, if the expression is negative, A and B are negatively 



PARTIAL ASSOCIATION, 


59 

associated either in C or in y or in both. Finally, if the expression is 
zero, A and B are either independent in both C and y, or positively 
associated in one and negatively in the other. 

The expression (1.7) may be thrown into a form more convenient when 
percentages are given. Dividing through by (B) we have: 

Sab.c + 8AB.y JAB) J AC) (BC)_(Ay) (By) 

-' (B) - ~(B) (C) (B) (y) (B) * 

The following examples illustrate the method. 

Example 4.3 .—(Figures compiled from Supplement to the Fifty-fifth 
Annual Report of the Registrar-General [C.—8508), 1807.) The following 
are the death-rates per thousand per annum, and the proportions over 
65 years of age, of occupied males in general, farmers, textile workers and 
glass workers (over 15 years of age in each ease) during the decade 1891- 
1900 in England and Wales. 




Proportion 


Death-rate 
per thousand. 

per thousand 
over 05 Years 


of Age. 

Occupied males over 15 . 

. 15*8 

46 

Farmers, ,, ,, 

. 19*6 

182 

Textile workers, males over 

15 15*9 

84 

Glass workers, ,, „ 

16*6 

16 


Would farming, textile working and glass working seem to be relatively 
healthy or unhealthy occupations, given that the death-rates among 
occupied males from 15-65 and over 05 years of age arc 11*5 and 102*8 
per thousand, respectively ? 

If A denote deaths B the given occupation , C old age , we have to apply 
the principle of equation (1.8), calculate what would be the death-rate 
for each occupation on the supposition that the death-rates for occupied 
males in general (11*5, 102*8) apply to each of its separate age-groups 
(under 65, over 65), and see whether the total death-rate so calculated 
exceeds or falls short of the actual death-rate. If it exceeds the actual 
rate, the occupation must on the whole be healthy ; it if falls short, 
unhealthy. Thus we have the following calculated death-rates : — 

Farmers . . . ] 1*5 \ 0*868 -f 102*8 x 0*182 -28*5 

Textile workers . 11 *5 x 0*966 p102*8 x 0*081 -14*6 

Glass workers . . 11*5 x 0*981 f 102*8 x 0*016 - 18*0 

The calculated rate for farmers largely exceeds the actual rate ; farming 
then must, on the whole, as one would expect, be a healthy occupation. 
The death-rate for either young farmers or old farmers, or both, must be 
less than for occupied males in general (the last is actually the ease); the 
high death-rate observed is due solely to the large proportion of the aged. 
Textile working, on the other hand, appears to be unhealthy (14*6 < 15*9), 
and glass working still more so (18*0 < 16*6); the actual low total death- 
rates are due merely to low proportions of the aged. 




60 THEORY OF STATISTICS. 

It is evident that age-distributions vary so largely from one occupation 
to another that total death-rates are liable to be very misleading—so mis¬ 
leading, in fact, that they are not tabulated at all by the Registrar-General; 
only death-rates for narrow limits of age (5 or 10 year age-classes) are 
worked out. Similar fallacies are liable to occur in comparisons of local 
death-rates, owing to variations not only in the relative proportions of the 
old, but also in the relative proportions of the two sexes. 

It is hardly necessary to observe that as age is a variable quantity, the 
above procedure for calculating the comparative death-rates is extremely 
rough. The death-rate of those engaged in any occupation depends not 
only on the mere proportions over and under 65, but on the relative 
numbers at every single year of age. The simpler procedure brings out, 
however, better than a more complex one, the nature of the fallacy involved 
in assuming that crude death-rates are measures of healthiness. 

Example 1A. —Eye-colour in grandparent, parent and child. (The 
figures art* those of Example 4.2.) 

A , light-eyed child ; B , light-eyed parent; C, light-eyed grand¬ 
parent. 

N -5008 (A B) - 2521 

(A) --8584 (AC) — 2180 

(B) 8052 ( BC) = 2231 

(CV-317S 


Given only the above data, investigate whether there is probably a 
partial association between child and grandparent. 

If there were no partial association wo should have: 

(AI/)(IIC) (AM C) 

( <»> + (ffl ' 

2521x2281 1080x947 

8052 + 1950 

- 1845 0 +513-2 
= 2358-2 

Actually (AC) ---2480; there must, then, he partial association either iri 
the //-universe, the //-universe, or both. In the absence of any reason to 
the contrary, it would be natural to suppose there is a partial association 
in both, i.e. that there is a partial association with the grandparent 
whether the line of descent passes through “ light-eyed ” or “ not-light- 
eyed ” parents; but this could not be proved without a knowledge of the 
class-frequency (ABC). 

Complete Independence. 

4.17. The particular ease in which all the 2 n - (n + 1) given associations 
are zero is worth some special investigation. 

It follows, in the first place, that all other possible associations must be 
zero, i.e. that a state of complete independence, as wc may term it 
exists. Suppose, for instance, that we are given: 



PARTIAL ASSOCIATION. 


61 


* 


me ) 

N 


( BC)- 


(AC) - 


(J7IC) - 


M)(C) 

N 

(AC)(BC) 


(A)(B)(C) 

N 2 


Then it follows at once that we have also: 


JAB)(BC)_(AB)(AC) 
1 ' ( B ) (A) 


i.c. A and C are independent in the universe of B\, and B and C in the 
universe of A' s. Again, 


(ABy)-(AB)-(ABC)- (A) ( B) 


(A)(B)(C) 
N 2 


( A)(B)(y)JAy)(By) 

N 2 (y) 

Therefore A and B arc independent in the uimerse of y\s. Similarly, it 
may be sliovvn that A and (' are independent in the univ r ersc of /3\ B and 
C m the universe of a’s. 

In the next place it is evident from the above that relations of the 
general form (to write the equation symmetrically) 


(ABC) (A) (B) (C) 
N N ' N ‘ N 


(4.9) 


must hold for every class-frequency. This i elution is the general form of 
the equation ot independence (3.2) (d), page 3f>. 

4.18. It must be noted, however, that (4.9) is not a oiterion for the 
complete independence of A, B and C m the sense that the equation 

( AB) (A) (B) 

N N ‘ N 

is a criterion for the complete independence ot A and 77. If we are given 
N, (A) and (77), and the last relation quoted holds good, vve know that 
similar relations must hold for (A/3), (aB) and (a/?). If N, (A), (77) and 
((’) be given, however, and the equation (4.9) holds good, we can draw no 
conclusion without further information ; the data are msuilieient. There 
are eight algebraically independent class-frequencies in the ease of three 
attributes, while N , (A), (77), (C) are only four : the equation (4.9) must 
therefore be showm to hold good for four frequencies of the third order 
before the conclusion can be drawn that it holds good for the remainder, i.e. 
that a state of complete independence subsists. The direct verification of 
this result is left for the student. 

Quite generally, if N, (A), (B), (C), ... be given, the relation 

(ABC ...) (A) (B) (C) 

N N * N ' N 


. (4.10) 



62 ' THEORY OF STATISTICS. 

must be shown to hold good for 2" - (n +1) of the nth order classes before it 
may be assumed to hold good for the remainder. It is only because 

2 " -(« + 1 ) =1 

when n— 2 that the relation 

(AB) (A) ( B ) 

N N ' N 

may be treated as a criterion lor the independence of A and B. If all the 
n (n > 2) attributes are completely independent, the relation (4.10) holds 
good ; but it does not follow that if the relation (4.10) holds good they are 
all independent. 


SUMMARY. 

1. The association of A and B in sub-universes of the type C, y, CD, 
CDE, ete. is called a partial association. 

(,<»o > 

A and B arc positively associated in C; and if 

(JC)(BC) 


(ABC) < 


(C) 


A and B are negatively associated in C. 
8. There an* associations 


in a universe characterised 


by 


n attribut es. 


n{n -1) 
*> 


of which are total and the remainder partial. 


4. All the associations are expressible in terms of N , ( A ), (B), (C) 9 
etc., and 2” - (nil) algebraically independent associations. These relations 
have, however, only a theoretical \uluc. 

5. If A and B are independent within the universe of C *s they will 
nevertheless be associated within the universe at large, unless C is inde¬ 
pendent of either A or B or both. 

G. In interpreting an association between A and B it must be remem¬ 
bered that this may arise owing to associations of A with C and B with 
C. To resolve this point it is necessary to consider the partial associations 
of A and B in C and y. 

7. C ornplcU* independence of n attributes occurs if 2” - (n + 1) algebraic¬ 
ally independent associations and hence all associations are zero. In this 
case 

(ABC . . .) (A) (B) (C) 

N N N ~N ’ * * 


but this last condition is not sufficient for complete independence. 



PARTIAL ASSOCIATION, 


03 


EXERCISES. 

4.1. Take the following figures for girls corresponding to those for boys in 
Example 4.1, page 52, and discuss them similarly, but not necessarily using 
exactly the same comparisons, to see whether the conclusion that ‘‘the connect¬ 
ing link between defects of body and mental dulness is the coincident defect 
of brain which may be known by observation of abnormal nerve signs'’ seems 
to hold good. 

A , development delects ; B , nerve signs ; 1 >, mental dulness. 


N 

10,000 

(AH) 

248 

(-1) 

082 

(AD) 

807 

(*) 

850 

(HD) 

808 

(i>) 

089 

(A HD) 

128 


4.2. (Material from ('casus of England and Wales, 1S9J , vol. 8.) The following 
figures give the numbers of those suffering from single or combined infirmities: 
(1) for all males ; (2) for males of 55 years of age and over. 

A, blindness; B, mental derangement; C, deaf-mutism. 



(1) 

(2) 


n) 

(2) 


All Males. 

Males 55-. 


Ail Males. 

Males 

N 

1 4.058,000 

1,877,000 

(AH) 

188 

65 

(A) 

12,281 

5,588 

(AD) 

51 

14 

(H) 

45,892 

10,809 

(HV) 

299 

47 

(O 

7,707 

740 

(AHC) 

11 

8 


Tabulate proportions per thousand, exhibiting the total association between 
blindness and mental derangement, and the partial association between the 
same two infirmities among deaf-mutes : (1) for males in general: (2) for those of 
55 years of age and o\er. Give a short verbal statement of the results, and 
contrast them with those of Exercise *1.1. 

4.8. (Material from Supplement to Fiji ({fifth Annual Report of the Registrar - 
General.) 

The death-rate from cancer for occupied males in general (over 15) is 0 085 
per thousand per annum, and for farmers 1*20. 

The death-rates from cancel for occupied males under and ovei 45 respectively 
are O K) and 2-25 respectively. Of the farmers, 40-1 per cent, are over 45. 

Would you say that farmers wore peculiarly liable* to cancer? 

4.4. A population of males over 15 years of age consists of 7 pei cent, over 05 
years of age and 98 per cent, under. The death-rales are 12 per thousand per 
annum in the younger class and 110 in the older, or 18 80 in the whole population. 
The death-rate of males (over 15) engaged in a certain industry is 20 7 per 
thousand. 

If the industry be not unhealthy, what must be the approximate proportion 
of those over 05 engaged in it (neglecting minor differences of age distribution) ? 

4.5. Show that if A and B are independent, while A and C , B and C arc 
associated, A and B must be disassociated either in the universe of (”s, the 
universe of y\s, or both. 

4.0. As an illustration of Exercise 4.5, show that if the following Mere actual 
data, there would be a slight disassoeiation between the cye-eolours of husband 
and wife (father and mother) for the parents either of light-eyed sons or not- 
light-eyed sons, or both, although there is a slight positive association for parents 
at large. 



M 


THEORY OF STATISTICS. 


A light eye-colour in husband, B in wife, C in son : 


N 

1000 

(AB) 

358 

(A) 

622 

(AC) 

471 

(B) 

558 

(BC) 

419 

(O 

617 




4.7. Show that if (ABC) (ufty), ( uBC) (A(ly), and so on (the ease of 

“complete equality of contrary frequencies’' of Exercise 1.7, page -•!). A , B 
and C are completely independent if A and B, A and C, B and (' are inde¬ 
pendent pair and pair. 

4.8. If, in the same case of complete equality of contraries. 


show that 


(AB)-A'H 
(A(') - N/t- d, 
(BC)-N/ 4- <>,, 


(ABC) 


(AC)(BC) 

(O 


(ABy) - 


(Ay)(By) 

M 


4<'A 

N 


so that the partial associations between . I and B in the universes C and y are 
positive or negative according as 

<5 >< N 


4.1). In the simple contests of a general election (contests in which one 
Conservative opposed one Socialist and there* were no other candidates) 60 per 
cent, of the winning candidates (according to the returns) spent more money 
than their opponents. Given that Gil per cent, of the winners were Conservati\ es, 
and that the Conservative expenditure exceeded the Socialist m 80 per cent, of 
the contests, find the percentages of elections won by Conservatives (1) when 
they spent more and (2) when thej spent less than their opponents, and lienee 
say whether you consider the abo\e figures evidence* of the influence of expendi¬ 
ture on election results or no. (No/e that if the one candidate in a contest be a 
Conservulive-winncr-who spends more than his opponent , the otlu r must necessarily 
be a Soei alist - loser-wh o spends less— and so forth, lienee the ease is one of 
complete equality of contraries.) 

4.10. Given that (A)IN ~(B)/N - (C)IN - .r, and that (AB)IN - (AC)IN y, 
find the major and minor limits to y that enable one to infer positive association 
between B and (\ i.e. (BC)jN - 

Draw a diagram on squared paper to illustrate your answer, taking and y 
as co-ordinates, and shading the limits within which y must lie in order to 
permit of the above inference. Point out the peculiarities in the case of in¬ 
ferring a positive association from two negative associations. 

4.J3. Discuss similarly the more complex ease (A)jN - «r, ( B))N — 2<r, 
(C)jN --3,r: 

(1) for inferring positive association between B and C given (AB)IN 

— (AC)IN ~-y. 

(2) for inferring positive association between A and C given (AB)jN 

~(BC)/N ~y, 

(3) for inferring positive association between A and B given (AC)IN 

— (BC)1N =y. 



CHAPTER 5, 


MANIFOLD CLASSIFICATION. 

Manifold Classification. 

5.1. Instead of dividing the universe of discourse into two parts by 
a simple dichotomy, we may also divide it into a number of parts by a 
similar process. For instance, we can extend the dichotomy of the 
universe of men into 44 those with blue eyes ” and fc4 those not with blue 
eyes” to a threefold division: k4 those* with blue eyes,” 41 those with 
browm eyes,” and 44 those with neither blue nor brown eyes ” ; or into a 
fourfold division by adding a fresh category, 44 those with grey eyes”; 
and so on. 

Generally, our umvcisc mav be divided first according to ,v heads, 
A v Ao, . . . A„; each of the classes so obtained into t heads, 
B u B 2 , • • • B ( ; each of these into u heads, C l9 C 2 , . . . C u ; and 
so on. 

This is called manifold classification. 

5.2. The general theory of manifold classification for n attributes is 
rather complicated, but its fundamental principles are very similar to 
those 1 which apply to dichotomy. A straightforward extension of the 
methods of Chapter I will gne tin following results, which we are content 
to announce without a formal proof : - 

(a) There are s xf x u x . . . ultimate classes. 

(b) The total number oi classes, including N and the ultimate classes, 
IS («9 + 1 ) {t -f* 1 ) { U 4 1 ) . . . 

(c) The data arc consistent if, and only if, every ultimate class-frequency 
is not negative. 

(d) The data are completely specified bv s xt xux ... algebraically 
independent elass-lrequeneies. Even if all these are not given, it may be 
possible to set limits to the other class frequencies. 

For example, if the population of the United Kingdom is classified 
geographically according to habitation m England, Wales, Scotland and 
Northern Ireland ; by eye-colour into blue, brown, grey, green and the 
remainder ; and by hair-colour into black, fair, red and the remainder ; 
there will be 150 classes altogether expressible in terms of 80 independent 
class-frequencies. 

5.3. Data so completely specified are very rare, and an elaborate 
discussion of the general case would hardly be justified by its practical 
value. For the remainder of this chapter, therefore, we shall be con¬ 
cerned solely with the ease of two characteristics, A and B. 

Contingency Tables. 

5.4. Let us suppose that the classification of the A\ is s-fold and 
that of the Zi’s is Mold. Then there will be st classes of the type A m B n . 

65 5 



66 


THEORY OF STATISTICS. 


Generalising slightly the notation of previous chapters, let the frequency 
of individuals A m be denoted by (A m ) and of individuals A m B n by 
( A tfl B n ). The data can then be set out in the form of a table of t rows 
and $ columns as follows : — 

Tabu, 5,1. 


Altubute A 



A, 

a 2 

1 

I 

A, , 

A„ 

Totals. 

A 


(AM 

1 

_ J_ _ 

U. I»i) 

( l.«x) 

M 


<^A) 

(A A) 

1 

( l g _jZ? 2 ) 

(AM 

M 

Bt 

( 1A) 

(W 

1 - 

( h 1 lit) 

(•!.«,) 

A) 

Totals 

i 

o») 

(A,) 

1 

( b i) 

( 1.) 

A r 


In this table the frequency of the elnss A W B V is entered m the com* 
partitionl common to the mth column and the n\\\ row : the totals at the 
ends of rows and at the loot of columns give the fust ordci frequencies, 
ix, the numbtis of A 1U \ and B n \; and finally, the giand total in the 
bottom right-hand corner gives the whole number ot observations. 

Such a table is called a contingency table. It is a generalised form 
of the fourfold (2 x 2-fold) table m 3.1. 

Example aj. -In Tabic 5.2 below the classification is B x 4-fold : 
the e t >e-colours are classed under the three heads “bhu,” “grey or 
green ” and “brown,” while the hair-colouis are classed under four 
heads, “lair,” “brown,” “black*" and “ led.” Taking the first row, 


Tvbli, 5.2.- 11 air- and Ktji-colours of bSOit Male s in Baden, 
(Ammon, Zur Anthu/pologie da Haduicr.) 




Han -( 

olour. 



Eye colour 




— 

Total. 


Fair 

Blown. 

Blat k 

Rod. 


Blue 

1768 

807 

189 

47 | 

2811 

(jKJ Ol (*UM 11 

<146 

1287 

746 

53 i 

3132 

Biown 

115 

438 

288 

10 

857 

Total 

2829 

2632 

1223 

116 

6800 


the table tells us that there wire *2811 men with blue eyes noted, of whom 
1708 had fair hair, 807 brown hair, 189 black hair and 47 red hair. Simi¬ 
larly, from the first column, there were 2829 men with fair hair, of whom 
1768 had blue eyes, 940 grey or green eyes and 115 brown eyes. 




MANIFOLD CLASSIFICATION. 


67 


Association in Contingency Tables. 

5.5. For the purpose of discussing tlie nature of the relation between 
the Jl’s and the B\, any such table may be treated on the principles of 
the preceding chapters by reducing it in different ways to a 2 x 2 -fold form. 
It then becomes possible to trace the association between any one or more 
of the A'* and any one or more of the B 9 s, either in the universe at large 
or in universes limited by the omission of one or more of the A 9 s, of the 
fi’s, or of both. 

If, e.g. 9 wc desire to trace the association between a lack of pigmen¬ 
tation in eyes and in hair, rows 1 and 2 may be pooled together as 
representing the least pigmentation of the eyes, and columns 2 , 3 and 4 
may be pooled together as representing hair with a more or less marked 
degree of pigmentation. We then have: 


Proportion of light-eyed with} ; 

lair hair . . . .j ' J 

Proportion of brown-eyed with ( 115/^57 - 

fair luiir * ~ 99 


■S 


Tlie association is therefore well marked. For comparison we may trace 
the corresponding association between the most marked degree of pigmen¬ 
tation in eyes and hair, i.e. brown eyes and black hair. Here we must add 
together rows 1 and 2 as before, and pool columns 1, 2 and 4 the column 
for red being really misplaced, as red represents a comparatively slight 
degree of pigmentation. The iigures arc: 


Proportion of brown-eyed with\ r7 
black hair . . .J -‘**l*° 7 


= 3t? per cent. 


Proportion of light-eyed with! 

black hair I Joj/oJIo — 10 


The association is again positive and well marked, but the difference 
between the two percentages is rather less than in the last ease. 

5.6. The mode of treatment adopted 111 the preceding two paragraphs 
rests on first principles and, if fully carried out, gives us all the information 
possible about the associations of the two attributes. At the same lime, 
it is laborious if s and t are at all large. Moreover, in practical work we arc 
often concerned, not with the associations ot individual A\ with individual 
B 9 s, but with finding the answer to a general question of the t) pe : Are the 
A\ on the ivhole distinctly dependent on the B 9 s, and if so, is this depend¬ 
ence 1 very close, or the reverse ? In fact, what we want is a coeflicient 
which will summarise the general nature of the dependence. We will 
proceed to discuss two such cocllicienls. 

Coefficients of Contingency. 

5.7. If the A s and B 9 s be completely independent in the universe at 
large, we must have for all values of m and n: 

(A m B n )-&*XBn) = ( A m B n ) a . . . (5.1) 

If, however, A and B are not completely independent, ( A rn B n ) and (A m B n ) 0 



THEORY OF STATISTICS. 


will not be identical for all values of m and n. Let the differcnee be given 
by 

o . . • (5.2) 

Let us note m passing the i olio wing properties of these quantities: 

(1) In the first place, 8 wm is not equal to 8 wm . 

(2) In the second place, the 8\ are not all algebraically independent. 


We have, m fact, lor any particular m: 


SfHl f ^w2 t ^tni 


- . . . f- 8 ,, 

(*tM) 

N 


4 8 „ 


+ (M)' 


N 




-(j m )~ ( ^; ,) {(B t ) + (n 2 ) t- ... 4 («,)} 

-o .(5.3) 


A similar relation is true lor any particular w. 

Now then are ,s£ 8-quantities. In vntuc of the ielationshi[) we have 
just proved, for any particular m only (/-I) of the /-(piantities h, mn are 
independent. Snnilaily, for any n only (.s ~ 1) are independent. Hence 
the total number of independent 8\ is (s -1)(/ - 1). 

5.8. These 8-quantities indicate the extent of the associations, and 
we expect a summarising coefficient to be built up from them in some way. 
It would, however, be useless to add them together, for m virtue of the 
relation of the preceding paiagiaph the sum is ye 10 . We wish to construct 
a eoelhcient which shall be independent of the signs of the 8-numbers. 

Wt therefore define 


C / ^ n \ 


(5.4) 


and call x 2 the “ square contingency.'” 

We then write: 

<t> 2 5v. ^ 


and call <// 2 the “ mean square e onlmge ncy/’ 

Cleailv x 2 and </> 2 , hung the sums of squares, cannot he negative. 
They vanish if, and only if, c\ci> S-numher vanishes, m which case A and 
B are independent. 

Pearson’s Coefficient of Mean Square Contingency. 

5.9. The quantity <f> 2 is not quite suitable in Use If to form a coefficient, 
because its limits vary in eJiife rent eases. Karl Pearson therefore proposed 
the coefficient C\ defined by 

c-J X 2 \l t 2 . 

This is e*alled the Coefficient of Mean Square Contingency. In general, 
no sign should b<‘ attae heel to the root, for the coefficient merely shows 
whether two characters are or are not independent ; but in certain cases a 
conventional sign may be used. Thus, m Table 5.2 slight pigmentation 



MANIFOLD CLASSIFICATION, 


69 


of eyes and hair appear to go together, and the contingency may be 
regarded as positive. If slight pigmentation of eyes had been associated 
with marked pigmentation of hair, the contingency might have been 
regarded as negative. 

5.10. T^he coefficient C has one serious disadvantage. Although, as 
may be seen from its definition, it increases with <£ 2 towards a limit I, it 
never reaches that limit. In fact, the maximum value which it can attain 
depends on s and t , and reaches unity only for an infinite 1 number of classes. 
This may be briefly illustrated as follows. Replacing 8 tnn m equation 
(5.4) by its value in terms of (A m Ii n ) and ( A m B n ) 0 , we have: 


!= f(4A) 2 \ 
1 V'hJi n )J 


■N 


(5.7) 


and therefore, denoting the summation b\ S, 


C 




N 


(5.8) 


Now suppose we have to deal w r ith a /xtf-fold classification in which 
(A fn ) ~ (II m ) for all values of m ; and suppose, further, that the association 
between A m and B m is perfect, so that (A m B m ) (A /n ) ( B,„ ) for all values 

of m 9 the remaining frequencies of lhe second order being zero; all the 
frequency is then concentrated in the diagonal compart incuts of the table, 
and each contributes N to the summation S. The total value of A is accord¬ 
ingly tX, and the value of C: 


This is the greatest possible value of C for a symmetrical t x/-fold classifica¬ 
tion, and therefore, m such a table, for: 


t - 

2, (' cannot c\ooo< 

l 0 707 

t 

•* >t »i 

0 810 

t ~ 

i 

0*800 

t 

5 „ „ 

0*804 

*=- 

^ II »! 

0013 


7 

0*020 

t- 

« 

0*035 

t~- 

« 

0*0 13 

/ 

10 

0*019 


5,11. Hence, coefficients calculated from different systems of classi¬ 
fication are not, strictly speaking, comparable. This is clearly undesirable. 
Two coefficients calculated from the* same data classified m two different 
groupings ought not to be very different. 

It is as well, therefore, to restrict the use of the C-coefficient to 5 x 5 or 
finer groupings. At the same time, the classification must not be made too 
fine, or the value of the coefficient is largely affected by casual irregularities 
arising from sampling fluctuations, 1 

. 1 Karl Pearson (ref. (8G) and in seveial other papers) has discussed a correction” 
to be made to C calculated from coarsely grouped data. The use of such corrections 
depends to some extent on assumptions about the universe, and may be regarded us 
attempts to bring the value of V closer to a putative coelheient of correlation ( cf . 12.20). 



70 THEORY OF STATISTICS. 

Tschuprow’s Coefficient. 

5.12. To remedy the defect to which we have just referred, Tsehuprow 
has proposed the coefficient T, defined by 

T 2 - ,—^ . (5.0) 

V(s -1 )(f -1) 


This coefficient varies between 0 and 1 in the desired manner when s ~t. 


We have 


C* 


r- 

i+<f>* 


V (,v -])(/-1 )T 2 
1 H V(s -!)(/-] )T" 

and conversely, 

T 2 r2 

(]-r^)x(,v !)(/-!) 


. (5.10) 


. (5.11) 


Calculation of C and T. 

5.13. The calculation of V and T is simplified bv the use of equation 
(5.8), which enables us to replace the calculation of the b\ by calculations 
based on frequencies of types (A m ), (Ji n ) and (A ni B n j. All these 
quantities are contained m the contingency tables. The following example 
will illustrate the method : — 


Example 5.2. —Consider the data of Table 5.2. (The classification is 
only 3 x 4-fold and is therefore rather crude for calculating (\ but it will 
serve as an illustration of the form of the ant hmelic.) 

We require first of all the quantities (.4 ,„/>*,,)<„ i.e. the “ independence ” 
v r alues. These are calculated directly from their definition 


(2t m JJ n )o 


(A m )(U n ) 

N 


and thus the value for the compartment in the mth column and nth row 
is the product ot the total frequencies m that column and row divided by 
the whole frequency, e.g. (A X B X ) 0 2820x2811/0800 1109, and so on. 

It is convenient to tabulate the frequencies so obtained m a second 
contingency tabic, as in Table 5.3. 


Tabu 5.3. Independence Values of the Frequencies for Table 5.2. 


Eye colour. 

Fair. 

Brown. 

Black 

Red. 

Blue .... 



1169 

1088 

500 

48*0 

Grey or Green . 

. 


1303 

1212 

663 

63 4 

Brown .... 

• 

* 

857 

332 

154 

i 

14*6 





MANIFOLD CLASSIFICATION* 


71 


We now calculate the quantities 


(A m B n ) {) 


(1768)2/1100 

2(578*9 

(946) a /1303 

(589*8 

(115) 2 /357 

870 

(807)2/1088 

598*6 

(1387)2/1*212 

1587*8 

(438)2/332 

577*8 

(180) 2/500 

70*6 

(740)2/503 

988*5 

(288)2/154 

588*6 

(47)2/48-0 

46*0 

(53)2/53-4 

52*6 

(10)2/14-0 

17*5 


Total S 

From equation (5.8): 


C 


v 


S - N 
S 


7875*2 


/1075*2 
^ 7875*2 


and 


- \ 0*1805 -0*87 


(1 -C2)\ / (V -1)(/- 1) 
0*1805 


0*8085 \ 0 


T \ 0*0015 
0*25 

The squares in such work may conveniently be taken from Barlow’s 
“Tables of Squares, Cube, s, e/e./’ or logarithms may be used throughout— 
five-figure logarithms are quite* suflicient. 

It will be seen that T is less than C. This is not always true. Which¬ 
ever eocflioient we use, however, the contingency between pigmentation 
of hair and eye is evident. 

5.14. While such eoefheicnts of contingency are a great convenience 
in many forms of work, their use should not lead to a neglect of the more 
detailed treatment of 5.5. W hether the coefficients be calculated or no, 
every table should always be examined with care to see if it exhibits any 
apparently significant peculiarities in the distribution of frequency, e.g. 
in the associations subsisting between A w and //„ in limited universes. 
A good deal of caution must be* used in order not to be misled b} r casual 
irregularities due to paucity of observations in some compartments of 
the table, but important points that would otherwise be overlooked will 
often be revealed by such a detailed examination. 

5.15. Suppose, for example, that any four adjacent frequencies, say 

{A m B n ) (A nnl I]„) 

( A m JJ n f i) (A m n/Ani) 



72 THEORY OF STATISTICS, 

are extracted from the general contingency table. If these are considered 
as a table exhibiting the association between A m and JB n in a universe 
limited to A fn A mn It„ B nn alone, the association is positive, negative or 
fjCro according as (A m B n )/(A m n B v ) is greater than, less than, or equal 
to the ratio {A m B„ n )j(A m ^B nvl f The whole of the contingency table 
can be analysed into a series of elementary groups of four frequencies like 
the above, each one overlapping its neighbours, so that an s xi-fold table 
contains (s - l)(t ~ 1) such “tetrads,” and the associations in them all 
can he very quickly determined by simply tabulating the ratios like 
(A m B n )i(A mn B t X (A m B nn )l{A wn B n „\ etc., or perhaps better, the 
proportions (A M B„)l{{A M Ji„) 4 (-4, W|1 /J„)}* etc., for every pair of columns 
or of rows, as may be most convenient. Taking the figures of Table 5.2 
as an illustration, and working from the rows, the proportions run as 
follows 

Fot rows 1 ami 2. For rows 2 and 8. 


1768/2711 

0-651 

01 0/3 00 L 

0-892 

807/2101 

0 668 

1 387/1825 

0-760 

180/035 

0-202 

716/1031 

0-721 

17/100 

0-470 

53/00 

0*768 


In both eases the first three ratios form descending senes, but the fourth 
ratio is greater than the second. The signs of the associations m the six 
tetrads are, aeeoulingly, 

4 4 — 

f I 

The negative sign m the two tetrads on the rigid is striking, the more so 
as other tables for ban and cyc-coloui, airanged m the same way, exhibit 
just the same charaeh ustic. Hut the peculiarity will be removed at once 
if the fourth column he placed immediately after the first : if this be done, 
i.e, if “ red ” be placed between “ fair ” and “ brown ” instead of at the 
end of the colour-series, the sign of the association in all the elementary 
tetrads will be 1 lie same. Tin* colours will then run fair, red, brown, 
black, and this would seem to be the more natural older, considering the 
depth of the pigmentation. 

Isotropic Contingency Tables. 

5.16. A distribution of frequency of such a kind that the association 
in every elementary tetrad is oi the same sign, possesses several useful 
and interesting properties, as shown m the following theorems. It will be 
termed an isotropic dislnbution. 

( 1 ) In an isotropic distribution the sign of the association is the same not 
only for every elenu ntanj teUad of ad/acnit fnqnencie s, hut for every set of 
four frequencies in tin compartments tottunon to two rous and two columns , 
(A m B t( ) f (*i ln v B n ), (A„ B tl f f ), (-d, v B n]u ). 

For suppose that the sign of association m the elementary tetrads is 
positiv e, so that 

(A m B n )(A w+ j B nJjT ) (A mn B n )(A m B un ) 

and similarly, 

(A ftl n B, { )(A vn ,B n , i) (A m , iB Jf )(A /)lil B n tl ) 

Then multiplying up and cancelling. we have: 

(A f)l B rl )( A Jt nB ni i) > (A w i 2B n )(A m B tnl ) 



MANIFOLD CLASSIFICATION. 73 

That is to say, the association is still positive though the two columns 
A m and A m + 2 are no longer adjacent. 

(2) An isotropic distribution remains isotropic in whatever way it may 
be condensed by grouping together adjacent rows or columns . 

Thus from the first and third inequalities above we have, adding: 

{A m S n )[(Affjj \H n 1 1 ) "t 1)J {A m I$ n j-i)[ (A m +ili n ) t (A m + 2-I^i)] 

that is to say, the sign of the elementary association is unaffected by 
throwing the (m + 1 )th and (m 2 )th columns into one. 

(3) As the extreme ease of the preceding theorem, we may suppose 
both rows and columns grouped and regrouped until only a 2 x 2 -fold 
table is left ; we then have the theoicm : 

If an isotropic distribution he reduced to a fourfold distribution in any 
way whatever, by addition of adjacent lows and columns , the sign of the 
association in such fourfold table is the same as in the elementary tetrads of 
the oiiginal fa bit. 

The ease oi complete independence is a special ease of isotropy. 
For if 

{A m B n )-(A„){B n )IN 

for all values of m and n , the assoi i at ion is evidently zero for every tetrad. 
Therefore the distuhution remains independent in whatever way the 
table be grouped, or in whatever way the universe be limited by the 
omission oi rows or columns. The expression “ complete independence v 
is therefore justified. 

From the work of the preceding section we may say that Table 5.2 
is not isotropic as it stands, but may be regarded as a disarrangement of 
an isotiopic (listiibution. It is best to uanange such a table in isotropic 
order, as otherwise diIteient i eductions to fomfold form may lead to 
associations of difh lent sign, though ol course they need not lieccssaiily 
do so. 

5.17. The following will sei\e as an illustration of a table that is not 
isotiopic, and cannot be rendered isotropic by any rearrangement of the 
orde r of rows and columns • 

Taijli 5.1. Showing the Frequtneits of Different Combinations of 
Kye-folonrs in Father and Son. 

(Data of Sir K. Gallon, from Karl Peaison, Phil. Trans., A, vol. 195, 

1900, p. 158; classification condensed.) 

1. Blue. 2. Blue-green, gic\. 8. Dark grey, Jiu/tl. 4. Brown. 


o 

o 

Es 

w 

to 


rather s K\< colour. 



1. 

1 2 * 

3. 

4. 

Total. 

1 

194 

70 

41 

80 

335 

2 

83 

124 

41 

36 

284 

a 

25 

34 

f 5 

23 

137 

4 

56 

86 

43 

109 

244 

Total 

358 

264 

180 

198 

1000 




74 


THEOKY OF STATISTICS. 


The following are the ratios of the frequency in column m to the sum 
of the frequencies in columns m and m f 1 : - 


1 and 2. 

( Ol UiVfNS. 

2 and 3. 

3 and 4. 

0-735 

0-681 

0-577 

0*401 

0*752 

0-532 

0* t2t 

0*382 

0*705 

0-609 

0* *56 

0-283 


The order in which the ratios run is different for each pair of columns, 
and it is accordingly impossible to make the table isotropic. The dis¬ 
tribution of signs of association m the several tetrads is: 

+ 

f 

The distribution is a curious one, the associations in tetrads round the 
diagonal ot the whole table being so markedly positive, and those m the 
immediately adjacent tetrads equally markedly negative. Neglecting the 
other signs, this js the effect that would be produced by taking an isotropic 
distribution and then increasing th n frequencies in the diagonal compart¬ 
ments by a sufheient percentage. Comparison of the given table with 
others from the same source shows that the peculiarity is common to the 
great majority of the tables, and accordingly its origin demands explana¬ 
tion. Were such a table treated by the method of the contingency 
coefficient, or a similar summary method, alone, the peculiarity might not 
be remarked. 

Complete Independence in Contingency Tables. 

5.18. It may he noted that in the ease of complete independence the 
distribution of frequency m every row is similar to the distribution in the 
row of totals and the distribution m every column similar to that m the 
column of totals ; for in, sa>, ihe column A n the fiequeneies are given by 
the relations: 

m.bi) - ( lrW {A " n ^ ( 

and so on. This property is of special importance in the theory of \ ariables. 

Homogeneous and Heterogeneous Classification. 

5.19. The classifications both of this and of the preceding chapters 

have one important characteristic in common, vu. that they are, so to 
speak, “ homogeneous ” -the principle of division being the same for all 
the sub-classes of any one class. Thus A\ and a’s arc both subdivided 
into tf’s and f}\ A,/ s, . . . A,’ s into By s, H 2 ' s, . . . B t \, and 

so on. Clearly this is necessary m order to render possible those compari¬ 
sons on which the discussions of associations and contingencies depend. 
If we only know that amongst the J\ there is a certain percentage of If s, 
and amongst the as a certain percentage of C’s, there are no data for any 
conclusion. 



MANIFOLD CLASSIFICATION. 


75 


Many classifications are, however, essentially of a heterogeneous 
character, c.g. biological classifications into orders, genera and species; 
the classifications of the causes of death in vital statistics and of occupa¬ 
tions in the census. To take the last ease as an illustration, the 1981 
census of England and Wales divides occupations into 82 (‘lasses. Some 
of these are not further subdivided— e.g. “ Fishermen.” Others are sub¬ 
divided into further general classes; e.g. Class 1 is divided into (1) 
Employers, (2) Furnacemcn, (8) Foundry Workers, (4) Smiths, (5) Metal 
Machinists, (6) Fitters and (7) Other Workers. These sub-heads are 
necessarily peculiar to the class under which they occur and their number 
is arbitrary and variable, and different for each main heading ; but so long 
as the classification remains purely heterogeneous, however complex it may 
become, there is no opportunity for any discussion of causation within the 
limits of the matter so derived. It is only when a homogeneous division 
is in some way introduced that we can begin to speak of associations and 
contingencies. 

5.20. This may be done in various ways according to the nature of 
the case. Thus the relative frequencies of different botanical families, 
genera or species may be discussed in connection with the topographical 
characters of their habitats—desert, marsh or heath—and we may observe 
statistical associations between given genera and situations of a given 
topographical type. The causes of death may be classified according to sex, 
or age, or occupation, and it then becomes possible to discuss the associa¬ 
tion of a given cause of death with one or other of the two sexes, with a 
given age-group or with a given occupation. Again, the classifications of 
deaths and of occupations are repeated at successive intervals of time ; and 
if they have remained strictly the same, it is also possible to discuss the 
association of a given occupation or a given cause of death with the earlier 
or later year of observation- i.e. to see wind her the numbers of those 
engaged in the ghen occupation or succumbing to the given cause of death 
have increased or decreased. Hut in such circumstances the greatest 
care must be taken to see that the ncecssan condition as to the identity of 
the classifications at the two periods is lulfillrd, and unfortunately it very 
seldom is fulfilled. All practical schemes of classification are subject to 
alteration and improvement from time to time, and these alterations, 
however desirable in themselves, render a certain number of comparisons 
impossible. Even where a classification has remained verbally the same, 
it is not necessarily really the same ; thus m the case of the causes of death, 
improved methods of diagnosis may transfer many deaths from one heading 
to another without any change in the incidence of the disease, and so bring 
about a virtual change in the classification. In any ease, heterogeneous 
classification should be regarded only as a partial process, incomplete until 
a homogeneous division is introduced either directly or indirectly, e.g. by 
repetition. 

Manifold Classification as a Series of Dichotomies. 

5.21. From a theoretical point of view, manifold classification can be 
regarded as compounded of a series of dichotomies. Take, for example, a 
case we have already considered, that of the classification of a universe of 
men according to the eye-colours blue, grey, brown and green. We could 
have produced this fourfold division by three dichotomies. In fact, 



70 THEORY OF STATISTICS* 

dividing the universe first into those with blue eyes and those with not-blue 
eyes we get two classes. Then dividing again into those with brown eyes 
and those with not-brown eyes we get four classes. This operation on the 
class of blue-eyed men, however, results in one zero class, because there arc 
no men with blue eyes which are at the same time brown, and one class 
which is, in fact, the class of blue-eyed men. Virtually, therefore, we have 
three classes: those with blue eyes, those with brown eyes, and the re¬ 
mainder. If we now dichotomise each of these into those with grey eyes 
and those with not-gre > eyes, we shall again get, neglecting the zero classes, 
the four classes of the manifold classification. 

5.22. It follows from this that any manifold classification can be 
regarded as produced by a succession of divisions in which, at each stage, 
each individual could fall into one of two alternatives, A or not-J. 

Pul in another way, this means that the possible answers to an un¬ 
ambiguous question can be reduced to a succession of answers of either 
“ yes ” or “ no.” For instance, suppose the question is, “ How old are you, 
in years ? ” We can replace this question by the succession of questions, 
“ Are you one year old ? ” “ Are you two years old Are you 

120 years old ? ” An answer of “ 47 ” to the first-mentioned question can 
then be expressed as an answer of u No ” to the first 10 of these questions, 
u Yes ” to the l*7th and “ No ” to the rest. 

Similarly, an answer to the question, u What is your name?” can be 
reduced to the questions, “Is the first letter of your name A? ” “ Is the 
first letter B ? ” . . . “ Is the second letter V ? ” and so on. Replies to 
a more general question can be reduced to the same form by a convenient 
classification ; e.g. th< replies to the question, u Are you m lav our of w r ar ? ” 
can be classified m the four foi ms: “Favourable w if bout qualification,” 
“Favourable with some qualification,” “ TTnfa\oiuable without qualifica¬ 
tion,” “ Unfavourable with some qualification,” and the answers to the 
questions can be reduced to answers “ yes ” or “ no ” to flic quest ions, “ Aie 
you, without qualification, in favour of war ? ” and so on. 

Recording Classified Information on Punched Cards. 

5.23. The information about ari individual, considered as a member 
of a universe, is information whether he does or docs not fall into the 
alternative classes which, as we have just seen, compose the most general 
homogeneous classification of the universe. Tf we imagine each individual 
filling in a questionnaire about himself, the totality of answers may, by 
suitably expressing the questions, be expressed as a number of “ yes’s ” and 
“ no’s,” and these icplies express all the information about the individual. 

This simple fact allows us to record the data in a most convenient way. 
Each individual is allotted a card, which is divided into a number of cells. 
Each cell corresponds to one of the dichotomies or simple questions the 
answers to which constitute the information. If the answer is “ Yes,” a 
hole is punched in the cell; if the answer is “ No,” the cell is left untouched. 

The card of any individual will thus be like a complicated tram ticket, 
with holes punched in various places. The punching is usually performed 
either by hand with a ticket collector’s punch, or with a machine similar 
in principle to the typewriter. The totality of punched cards forms a 
miniature of our universe -each individual lias a card on which is recorded 
the whole of the information about him. 



MANIFOLD CLASSIFICATION* 


77 

The use of this system lies in the fact that punched cards are easily 
handled and sorted by machinery. If, for example, we want to know a 
particular class-frequency, we can adjust certain electrical, pneumatic or 
mechanical stops, and the machine will segregate all the cards in the class 
and count them for us. 

5.24. A similar device has been applied to the sorting of data by hand. 
A card is prepared with a row of circular holes punched all the way round 
near its edge, but so that no hole is open to the edge. Each hole corre¬ 
sponds to a dichotomy or a simple question. When preparing the card, if 
the individual falls into the A class, or the answer to the question is “ Yes,” 
a piece is clipped out of the card so that the hole is now open to the edge. 
If the individual falls into the not-A class, or the answer to the question is 
“ No,” the hole is left alone. 

To separate the A 's from the not-y/’s, or the “ yes ” cards from the 
“ no ” cards, they arc arranged in a vertical plane so that corresponding 
cells are similarly placed. A skewer is then inserted in the appropriate 
hole and lifted. The not -A cards are lifted out, whilst the A cards fall 
away, since the pieee of card between the hole and the edge has been cut 
away. «y repeating the operation with the skewer in the appropriate 
holes we can isolate the cards m any given (‘lass. These can then be counted 
and the size of the class-frequency determined. 

5.25. The labour of punching cards and the expense of machinery is 
justified only when the number of individuals is large and the number of 
ultimate classes is also large. Tins arises, for example, in the taking of 
a census of population. 

Numerically Defined Attributes. 

5.26. The attributes we have instanced in tlu foregoing pages have 
usually been of a qualitatne kind. The methods described are, however, 
applicable to data classified on a numerical basis. Consider, for example, 
the following table : — 


Table 5.5 .—Number oj Families JJeJieient in Room Spate m 95 Croivdtd London Wards. 
(Census of XOJU, Housing Report , p. \\\u.) 



The distinction between successive rows and columns is not quite of the 
kind of Table 5.2. In the latter, for instance, we drew a line between black 










THEORY OF STATISTICS. 


78 

hair and brown, a line which could be drawn by anybody who was not colour¬ 
blind, although there may be border-line cases of mixed colours which 
would present difficulty. But in Table 5.5 above the line is drawn by 
counting—a much more precise operation. Moreover, the rows and 
columns have a certain natural order given by the numerical sequence. 
It would seem absurd to put the column which is headed 44 two rooms ” 
between those headed 44 three rooms ” and 44 lour rooms,” but in Table 5.2 
there is no a priori reason for putting 44 black ” between 44 brown 55 and 
44 red.” 

5.27. We might also have a contingency table in which the attributes 
were measurable quantities, and the rows and columns of the table de¬ 
termined by ranges of those quantities. This, again, is slightly different 
from the case of the previous paragraph, for these ranges are to a large 
extent arbitrary, whereas in Table 5.5 the indivisible nature of the room 
compels us to count m units of at least one room. 

5.28. Finally, we may have a table which is given by one qualita¬ 
tive attribute and one quantitativ e attribute. Consider, for example, the 
following :— 

r i aui.l 5.0.— W eitfht and MuiUdity in a Selection of Criminals. 

(Data from M. II. Whiting, “On the* Association of Temperature, Pulse amt Respiration 
with Physique ami Intelligence in Criminals,“ JUumctrika , vol. 11, pp. 1-37.) 


Weight (lbs). 



90-120. 

120-130. 

130-140 J HO 130 

1 130 

Totals. 




i 

| 


upwaid. 

| 

Normal . 

" Tr 

. JL 

Of | 

100 

1 121 

396 

Weak . 

r ir, - 

_*«• J 


15 

15 

i 

1 97 1 

i____! 

Totals. 

; 'M > 

I Ml 1 

1 

_ 

I2K 1 

121 

1 139 

i 493 ! 

1 1 


5.29. The methods of the prev ious chapters are applicable also to such 
tables. Numerically measurable quantities may, however, be treated by 
other methods, to which v\e shall tome in due course. YVe mention the 
point here in order to remov i au\ possible idea that the theory o( attributes 
is concerned solely with qualitative classification, and is not appropriate 
to the more pxecisc data given by a numcucally assessable attribute. 


SUMMARY. 

1 . The division of a universe according to an attribute A into a number 
of heads is called manifold classification. This is an extension of the idea 
of dichotomy, in which the universe is divided into two parts only. 

2 . Manifold classification according to two attributes A and B gives 
rise to a contingency table. 

Association in a contingency table may be examined by reducing it 
in a number of ways to a 2 x 2 tabic. 

4. The general nature of the association may be summarised by a 
coefficient. 




MANIFOLD CLASSIFICATION. 


79 


5. We define 


<{A m B n )-(A m B n ) 0 


The “ square contingency ” is given by: 

2 J K„, ) v 


The 44 mean square contingency 


’ by:. 
*) 

x~ 

N 


(j. Pearson’s 44 coefficient of mean square contingency ” is defined by; 



7. Tschuprow’s 44 coefficient of contingency ” is defined by: 

7-2 _ </>* 

8 . Certain types of table, known as isotropic contingency tables, possess 
special features of some importance. 

9. Any manifold classification may be regarded as a succession of 
dichotomies. This fact is the basis of the use* of punched cards for record¬ 
ing and analysing statistical data. 

10 . Manifold classification may arise not only from an attribute which 
is specified under heads of a qualitative kind, but also from a quantitative 
attribute specified by counting or measurement. 


EXERCISES. 

5.1. (Data from Karl Pearson, “On the Inheritance of the Mental and Moral 
Characters in Man,’’ Jour, of the Anthrop . hint., vol. 88, and Biomet rika, vol. 8.) 
Kind the coefficient of contingency (coefficient of mean square contingency) for 
the two tables below, showing the resemblance between brothers for athletic 
capacity and between sisters for temper. Show that neither table is even 
remotely isotropic. (As stated in 5.11, the coefficient of contingency should not 
us a rule he used for tables smaller than 5 x 5-fold: these small tables are given 
to illustrate the method, while avoiding length) arithmetic.) 

A. Atiili/hc Capacity. 


First Brother. 


r — ---— 

— - 

— 



1 

Athletic 

Betw lxt 

Non- 

athletic. 

Total. 

Athletic 

1)00 

20 

340 

1000 

Betwixt 

20 

70 

0 

105 

N on-atbletic 

140 

1) 

370 

519 

Total 

1000 j 

105 

519 1 

1690 

— _ _„ _ ' 


__ 

• i 

—- 



80 


THEORY OF STATISTICS 

B. Temper. 

First Sistor. 



Quick. 

Good- 

natured. 

Sullen. 

Total. 

Quick .... 

198 

177 

77 

452 

Good-natured 

177 

996 

165 

1338 

Sullen .... 

77 

165 

120 

362 

Total 

452 

1338 

362 

2152 


5.2. Calculate T and C lor the lolloping table, and trace the association 
between the progress of building and the urban character of the district:— 


Houses in England and Walts. (Census of 1001, Summary Tabk A'.) 
(000's omiltt d.) 



Inhabited. 

Unin¬ 

habited. 

Building. 

Total. 

Adm. County of London 

571 

40 

5 

616 

Other urban districts . 

4061 

785 

45 

4394 

Rural districts . • 

1625 

124 

12 ! 

1761 

Total for England and Wales 

6260 

449 

62 

6771 


5.3. Show that lor a given s and /, (' and T are equal foi two values ot <f> i , 
one of which is zero; that for <f> 1 between these values C ^ T; and that lor <f> 2 
greater than the higher value T * (\ 

5.4. Find whether the following contingency table is isotiopic, and if it is 
not, ascertain whether it can be arianged in an isotiopic hum: 



‘ 

A >- 

d a 

*4 a 

A 

. 

Totals. 

». 

90 

43 

17 

27 

16 

193 

Jh 

235 

; 

88 

44 

1 1 


40 

167 

— 


-- 

1 __ 


— 

__ 


300 

103 

J 54 

71 

48 

576 

Totals 

62 ~> 

234 

I | 

115 

158 

104 

1236 


5.5. Calculate C and T for the table of the previous example. 
5.0. Show that in a positively isotropic contingency table, 


*i, 


WA). 


and is 


6 n 


5.7. 1000 subjects of Knglish, French, German, Italian and Spanish 
nationality Mere asked to* name their preference among the music of those five 





MANIFOLD CLASSIFICATION. 81 


nationalities. The results were as follows (1 -English, 2-French, 8-German, 
4> - Italian, 5 - Spanish): - 


Nationality oi Music Preferred. 



1. 

2 

3. 

L 

5. 

Totals. 

1 

32 

10 

75 

47 

30 

200 

2 

10 

07 

42 

11 

40 

200 

8 

12 

~~23 

107 

30 

22 

200 

4 

10 

20 

_ 

14 

70 

11 

200 

5 

8 

,"7,3 

30 

43 

00 

200 

Totals 

78 

170 

j 208 

1 

243 

i 

202 

J000 

1 


Discuss the association between the lialtonality of the subject and the 
nationality of the music prefeued. 

5.8. In Table 5.(i calculate C and 7\ and discuss tlio light thrown by this 
table on the association between physique and intelligence m the criminals of 
the data. 

5.0. Show that for a 2\2 contingency table m which the frequencies arc 
(AJi,) =a 9 (AJJJ b, (AJJJ < and (AMJ <h 

2 (a i b hr { d)(atl-be)* 

* (a f b)(c vd)(b 1 d)(a <) 

and lienee find rand T in tcims oi b, (, d. 

5.10. In a paper discussing whether lateral it \ of hand is associated with 
latciality of eye (measured by astigmatism, acuity oi vision, etc.) T. L. Woo 
obtained the following results ( Biomclrika , vol. 20A, pp. 70 118):— 


Ocular LaUrality foi Oenci d Astigmatism. 


■Si 

&& 

7; m 


“Left <yed ” 

Amhioe ulai 

1 Right t\ul ” 

Totals, 

S 9 
f?S 

Left handed 

34 

02 

28 

124 

cS 73 

PQ 

Ambidextrous 

27 

28 

20 

75 

11 

Right handed 

57 

105 

52 

214 

Totals . 

118 

195 

30t) 

413 


Show that laterality of eye is only slightly associated with laterality of 
hand. 





CHAPTER (>. 


FREQUENCY-DISTRIBUTIONS. 

Variables. 

6.1. As we emphasised at the close of the List chapter, the methods 
of the theory of attributes are applicable to all observations, whether 
qualitative or quantitative. We have now to pioeeed to the considera¬ 
tion of special processes adapted to the treatment of quantitative data, 
but not as a rule available fox the discussion of purely qualitative observa¬ 
tions (though there arc some important exceptions to this statement, as 
suggested m 1.2). 

Numerical measurement is applied only to a quantity which can 
present more than one numerical value. Otherwise there would be no 
point in lncasuimg it. Such a quantity is therefore called a variable , 1 
and this section of our work may be termed the theory of variables. 

As common examples of variables which are subject to statistical 
treatment we may cite birth- and death-rates, prices, wages, barometer 
readings, rainfall records, and measure, ments or enumerations (c.g. of 
glands, spines or petals) on animals or plants. 

Quantities winch can take anv numerical value within a certain range 
are called continuous variables. Such, lor example, are birth-rates 
and barometiic readings. Ouantities winch can take only discieie values 
are callul discontinuous variables. This class, lot* instance, would 
include data of the number of petals on flowers or the number of rooms 
in a house. 

Frequency -distributions. 

6.2. If some hundreds or thousands of values of a variable have 
been noted merely in the arbitrary older in which they occur, the mmd 
cannot properly grasp the significance of the record. We must condense 
the data by some method of ranking or classification before their char¬ 
acteristics can be comprehended. 

Oneway of doing this would be to dichotomise the data by classifying 
the individuals as A\ or not-according as the value of the variable 
exceeded or fell short of sonic given value. But this is too crude, and 
the sacrifice of mfoimation is too great. A manifold classification, 
however, avoids the crudity of the dichotomous form, since the classes 
may be made as numeious as we please. Moreover, numerical measure¬ 
ments lend themselves with peculiar readiness to a manifold classification, 
for the class limits can be conveniently and precisely defined by assigned 
values of the variable. 

6.3. For convenience, the values of the variable chosen to define 
the successive classes should be equidistant, so that the numbers of 
observations in different classes are comparable. 

1 It is also called a variate. We shall use the two terms as synonymous. 

82 



* FREQUENCY ^DISTRIBUTIONS. 88 

The interval chosen for classifying is called the class-interval, and 
the frequency in a particular class-interval is called a class-frequency. 

Thus, for measurements of stature, the class-interval might be 1 inch, 
or 2 centimetres, and the class-frequencies would be the numbers of indi¬ 
viduals whose statures fell within each successive inch or each successive 
2 centimetres of the scale; returns of birth- or death-rates might be 
grouped to the nearest unit per thousand of the population ; returns of 
wages might be classified to the nearest shilling, or, if it is desired to obtain 
a more condensed table, to the nearest five or ten shillings. Discon¬ 
tinuous variables to a great extent determine their own class-intervals, 
which must either be equal in width to the unit amount of variation, 
or equal to some multiple of it. For example, in enumerations of the 
number of rooms in a house we naturally take our class-interval to be 
one room ; in enumerations of the petals on a flower we may take one 
petal, or, if the range of variation is very great, say five petals or more. 

6.4. The manner in which the class-frequencies are distributed over 
the class-intervals is spoken of as the frequency-distribution of the 
variable. 

A few illustrations will make clearer Lire nature of such frequency- 
distributions, and the service which they render in summarising a long 
and complex record. 

(a) Table 6.1. In this illustration the birth-rates per thousand of 
the population in 1933 of 15G7 local government areas of England have 
been classified to the nearest unit; i.e . the number of districts has been 
counted in which the birth-rate was between 1*5 per thousand and 2 5, 
between 2-5 and 3-5, and so on. The frequency-distribution is shown by 
the tabic. 

Table 0.1.— Showing the Number of Local Government Areas in England with Specified 

Birth-rates per Thousand of Population. (Material from the Registrar-General's 

Statistical Review of England and Wales for 1938.) 



Number of Districts 


Number of Districts 

Birth-rate. 

with Birth-rate 
Between 

Birth-iate. 

with Birth-rate 
Between 


Limits Stated. 


Limits Stated. 

i-r>- 2 -r> 

J 

13 5-145 

271 

2-5-/ 3 5 

*> 

14*5-15 5 

190 

3 5- 4*5 

2 

15 5-16-5 

127 

4 5- 5*5 

3 

16 5-17*5 

89 

5*5- 6*5 

7 

17*5-18*5 

78 

6*5- 7*5 

9 

18 5-19 5 

37 

7*5- 8*5 

14 

19*5-20 5 

21 

8*5- 9-6 

41 

20*5-21 *5 

17 

9*5-10*5 

83 

21*5-22 5 

4 

10*5-11*5 

131 

22*5-23*5 

4 

11*5-12*5 
12-5-13-5 * 

192 

242 

23-5-24*5 

2 

Total J 

1567 




. 


- 


Although a glance through the original returns, which are spread amongst 
many other figures over 42 pages, fails to convey any definite impression, 





84 THEORY OF STATISTICS. 

a brief inspection of the above table brings out a number of important 
points Thus, we see that the birth-rates range, in round numbers, from 
2 to 24 per thousand; that the birth-rates in some 75 per cent, of the 
districts lie within the narrow limits 10*5 to 16*5, the rates most frequent 
being near 14 ; and so on. It may be remarked that some of the areas 
are very small, with no more than 10 or 20 births, and these account 
mainly for the extremely divergent rates. 

(b) Table 6.2. The numbers of stigmatie rays on a number of Shirley 
poppies were counted. As the range* of variation is not great, the unit 
is taken as the class-interval. The frequency-distribution is given by 
the following table- 

Table 0.2. Showing the Frequencies of Seed Capsules on cerium Shirley Poppies, with 
Different Numbers of Stigmatie Says. (Cited from Biometrika , vol. 2, 1902, p. 89.) 


Number of 
Stigmatie 
liaya. 

Number of 
Capsules 
with said 
Number of 
Stigmatie Rays 

Number of 
Stigmatie 
Rays. 

-- 

Number of 
Capsules 
with said 
Number of 
Stigmatie Rays. 

6 

3 

H 

302 

7 

11 

15 

234 

8 

38 

16 

128 

9 

100 

17 

50 

10 

i:.2 

18 

19 

11 

238 

19 

3 

12 

305 

20 

1 

13 

315 

Total 

1906 


The numbers of rays range from 6 to 20, the most usual numbers 
being 12, 16 or 14. 

(c) Table 6.3. 206 screws were taken as they came off the lathe 

which was turning them. Their lengths, which should have been 1 inch, 
were measured. The following table shows the vserews classified by the 
number of thousandths of an inch by which they exceeded or fell short 
of 1 inch in length 

Table 0 . 8 . - Showing the Frequencies of Semes Classified according to the Fitent to which 
they Varied in Length from the Standard of 1 Inch. (Unpublished data, A. M. 
Lester.) 


Difference in Length 
from 1 Inch 
(Thousandths of an 
Inch). 

Number of 
Screws. 

Difference m Length 
from 1 Inch 
(Thousandths of an 
Inch). 

Number of 
Screws. 

-6 to -5 

1 

■+ I to 4 2 

34 

- r> to 4 

4 

+ 2 to f 3 

25 

-4 to -3 

11 

4 .1 to 4 4 

16 

-3 to ~2 

22 

1 4 to i 5 

8 

- 2 to - 1 

25 

t-G to f-6 

1 

-1 to 0 

0 to +1 

27 

32 

Total 

206 



85 


FREQUEtfCY-mSTRIBtTTIONS. 

It will be seen that the maximum frequency, i.e. 84, occurs for screws 
from 0*001 to 0*002 inch in excess of the standard. About 80 per cent. 
Jie in the range three-thousandths of an inch on either side of the standard. 

6.5. Expanding slightly the brief description we have given, tables 
setting out frequency-distributions are formed in the following way : — 

(1) The magnitude of the class-interval is first fixed. In Tables 6.1, 
6.2 and 6.8 one unit was chosen. 

( 2 ) The position or origin of the intervals must then be determined; 
e.g. in Table 6.1 we must decide whether to take as intervals 9-10, 10 - 11 , 
11-12, etc., or 9*5-10*5, 10*5 11*5, 11*5 12*5, etc. 

(3) This choice having been made, the complete scale of intervals is 
fixed and the observations are classified accordingly. 

(4) The process of classification being finished, a table is drawn up on 
the general lines of Tables 6.1 6.3, showing the total number of observa¬ 
tions in each class-interval. 

It is necessary to make a few remarks about each of these heads. 

Magnitude of Class-interval. 

6 .6. As already remarked, in eases where the variation proceeds by 
discrete steps of considerable magnitude as compared with the range of 
variation, there is very little choice as regards the magnitude of the (lass- 
interval. The unit will in general have to .serve. But if the variation 
be continuous, or at least take place by discrete steps which arc smgll 
in comparison with the whole range of variation, there is no such natural 
class-interval, and its choice is a matter for judgment. 

The two conditions which guide the choice arc these : (a) We desire 
to be able to treat all the values assigned to any one class, without serious 
error, as if they were equal to t he mid-value of the class-interval, e.g. 
as if the birth-rate of every district in the first class of Table 6.1 were 
exactly 2*0, the birth-rate of every district in the second class 3*0, and 
so on; ( b) for convenience and brevity we desire to make the interval 
as large as possible, subject to the first condition. These conditions will 
generally be fulfilled if the interval be so chosen that the whole number 
of classes lies between 15 and 25. A number of classes less than, say, 
ten leads in general to very appreciable inaccuracy, and a number over, 
say, thirty makes a somewhat unwieldy table. A preliminary inspection 
of the record should accordingly be marie and the highest and lowest 
values be picked out. Dividing the difference between these by, say, 
twenty-five, we have an approximate value for the interval. The actual 
value should be the nearest integer or simple fraction. 

Position of Intervals. 

6.7. The position or starting-point of the intervals is, as a rule, 
more or less a matter of indifference. It can therefore be chosen as is most 
convenient for the particular case under discussion, e.g. so that the limits 
of the intervals are integers, or, as in Table 6 . 1 , so that the mid-values are 
integers. It may also be chosen so that no limits correspond exactly 
to any recorded value of the variate, in order to obviate any difficulty 
in deciding to which class a particular individual should be assigned 
(cf. 6.9). 



86 THEORY OF STATISTICS. 

The location of the intervals is, however, important when the values 
of the variate tend for some reason to cluster round particular values. 
Such a ease arises, for instance, in age returns, owing to the tendency 
to state a round number where i he true age is unknown, or a reluctance 
to admit one’s real age. 1 It is also common wherever there is some 
doubt as to the final digit in reading a scale, and scope is given to the 
idiosyncrasies of the observer. 

Table 6.4 shows results for four observers as illustrations, the 
frequencies being reduced for comparability to a total of 1000. Column A 
is based on measures by G. IJ. Yule, on drawings, to the nearest tenth of 
a millimetre. It is recognised, of course, that measures cannot really 

Tabtx 0.4. Frequenty-distributions of Final Digits in Measurements by Four Observers. 

(G. V. Yule, ‘ On Heading a Seale,” Journal Royal Statistical Socuty , vol. 90, 1927, 

p. 570.) 


Final Digit. 

Fiequency of Final Digit per 1000. 

A. 

B. 

C. 

D. 

0 

158 

122 

251 

358 

1 

97 

98 

37 

49 

2 

125 

98 

80 

90 

3 

73 

90 

72 

63 

4 

76 

100 

55 

37 

5 

71 

112 

2 11 

211 

6 

90 

98 

71 

62 

7 

56 

99 

75 

70 

8 

126 

101 

72 

44 

9 

129 

81 

65 

16 

Total 

1001 

999 

! iooo 

1000 

Actual ol>- I 
seivations J 

1268 

3000 

1000 

1000 


be made to such a degree of precision ; but the measurer believed that 
he was making them carefully, and as they were made with a Zeiss scale, 
in which the divisions arc* ruled on the under side of a piece of plate-glass, 
readings were unaffected by parallax. Nevertheless, it will be seen that 
the zeros, and also 2, 8 and 9, were hea\ily over-emphasised—an odd 
selection of preferences! On the whole, the centre of the millimetre was 
neglected and measures piled up at the two ends. 

The data for columns B, C and I) are all drawn from the same published 
report, and refer to sundry head measurements taken on the living subject. 
On the basis of a statement in the introduction to the report, it was possible 
to compile the data separately for the three assistants (B, C, D) who had 
done the actual measuring. It will be seen that B was rather good : there 
is a relatively slight excess at 0 and 5, but otherwise his measurements are 

1 This effect is practically the same for men as for women. Cf. Table I in the 
Appendix to the paper cited in the heading to Table 0.4 above. 




FKEQUENCY-DIBTHIBUTIDNS. 


87 


fairly uniformly distributed. C was decidedly not good, rounding off nearly 
one measurement in two to the nearest centimetre or half-centimetre. D 
was simply outrageously bad—so bad that it might have been better not 
to publish his measurements. Nearly 57 per cent, of his measurements 
were made only to the nearest centimetre or half-centimetre -a quite 
inadequate degree of precision for head measurements often only a few 
centimetres in magnitude. 

When there is any possibility of clustering of variate values, it is as 
well to subject the data to a close examination before finally fixing on 
the method of classification. On the whole, the intervals should be 
arranged as far as possible so that the values round which the clustering 
occurs fall towards the interval mid-values. This procedure avoids 
sensible error in the assumption that the intenal mid-value is approxi¬ 
mately representative of the values of the class. 

Classification. 

6.8. The scale of intervals having been fixed, the observations may 
be classified. If t he number of observ at ions is not large, it will be sufficient 
to mark the limits of successive intervals in a column down the left-hand 
side of a sheet of paper, and transfer the entries of the original record 
to this sheet by marking a 1 on the line corresponding to anv class for 
each entry assigned thereto. It saves time m subsequent totalling if 
each fifth entry m a class is marked by a diagonal across the preceding 
four, or by leaving a space. 

The disadvantage in this process is that it offers no facilities for 
cheeking : if a repetition of the classification leads to a different result, 
there is no means of tracing the error. If the number of observations is 
at all considerable and accuracy is essential, it is accordingly better to 
enter the values observed on cards, one to eaeli observation. These are 
then dealt out into packs according to their classes, and the whole work 
cheeked by running through the pack corresponding to each class, and 
verifying that no cards have been wrongly sorted. 

6.9. In some eases difficulties may arise in classifying, owing to 
the occurrence of observed values corresponding to class-limits. Thus, in 
compiling Table (3.1 some districts will have been noted with birth-rates 
entered in the Registrar-General's returns as 10*5, 17*5 or 18*5, any one 
of which might at first sight have been apparently assigned indifferently 
to either of two adjacent classes. In such a ease, however, where the 
original figures for numbers of births and population are available, the 
difficulty may be readily surmounted by working out the rate to another 
place of decimals ; if the rate stated to be 16*5 proves to be 16*502, it 
will be sorted to the class 16*5-17*5; if 16*498, to the class 15*5 16*5. 
Birth-rates that work out to lialf-umts exactly do not occur m this example, 
and so there is no real difficulty. 

In the ease of Table 6.3, again, there is little difficulty in knowing the 
class to which an individual should be assigned. 

Difficulties of this type may, in fact, always be avoided if they are 
borne in mind in fixing the class-intervals, by fixing the intervals to a 
further place of decimals or a smaller fraction than the values in the 
original record. Thus, if statures are measured to the nearest centimetre, 
the class-intervals may be taken as 150*5-151*5, 151*5-152*5, etc. ; if to the 



88 TITEOKY OF STATISTICS. 

nearest eighth of an inch, the intervals may be 59 1 1 g-60}$, 
and so on. 

If the difficulty is not evaded in any of these ways, it is usual to assign 
one-half of an intermediate observation to each adjacent class, with the 
result that half-units occur in the class-frequencies (<f. Table C.9, p. 98). 
The procedure is rough, but probably good enough for practical purposes ; 
strict precision is usually unattainable, for in point of fact the odd way in 
which different individuals read a scale, for example, renders it impossible 
to assign exact limits to intervals. 

Tabulation, 

6.10. As regards the actual drafting of the final table there is little 
to be said, except that care should be taken to express the class-limits 
clearly ami, if necessary, to say how the difficulty of intermediate values 
has been met or evaded. The class-limits are perhaps best given as m 
Tables 6.1 and 6.3, but may be more briefly indicated by the mid-values of 
the class-intervals. Thus, Table (3.1 might have been given in the form : 


Birth rate per KXK) to 
the Neaicst Unit. 


I Number of Districts with 
I Haul Birth jato. 


2 

3 

4 

etc, 


1 

*> 

2 

etc. 


It should be noticed that the method of defining class-intervals adopted 
in Table 6.3 leaves the class-limits uncertain unless the degree of accuracy 
of the measurements is also given. Thus, in a table giving frequencies of 
men in certain height-ranges of 1 inch in width, say “ 57 and less than 58,” 
etc., if measurements were taken to the nearest eighth of an inch, the class- 
limits are really 56 j f;- 57 J /, 57 J 581 etc. ; if they were only taken to 
the nearest quarter of an inch, the limits are 5(> s -571, 57;, 58 1, etc. With 
such a form of tabulation a statement as to the number of significant figures 
in the original record is therefore essential. It is better, perhaps, to state 
the true class-limits and avoid ambiguity. 

6.11. The rule that class-intervals should be all equal is one that is 
very frequently broken in official statistical publications, principally in 
order to condense an otherwise* unwieldy table, thus not only saving space 
in printing but also considerable expense in compilation, or possibly, in the 
case of confidential figures, to avoid giving a class which would contain 
only one or two observations, the identity of which might be guessed. It 
would hardly be legitimate, for example, to give a return of incomes relating 
to a limited district in such a form that the income of the two or three 
wealthiest men in the district would be clear to any intelligent reader with 
local knowledge. 

If the class-intervals be made unequal, the application of many statis¬ 
tical methods is rendered awkward, or even impossible. Further, the 
relative values of the frequencies are misleading, so that the table is not 
perspicuous. Thus, consider the first two columns of Table 6.5, showing 



FREQUENCY-DISTRIBUTIONS. 89 

the number of persons liable to sur-tax and super-tax classified according 
to their annual income. On running the eye down the column headed 
“ Number of Persons,” the attention is at once caught by the three irregu¬ 
larities at the classes “ £3000 and not exceeding £4000,” “ £8000 and 
not exceeding £10,000,” and “£10,000 and not exceeding £15,000.” But 
these have no real significance ; they arc merely due to changes in the 
magnitude of the class-interval at those points. A further change occurs 
at the £30,000 and at the £50,000 mark, although the attention is not 
directed thereto by any marked irregularity in the frequencies. 

Table 6.5. — Showing the Numbers of Persons in the United Kingdom liable to Sur-tax 
and Super-tax in the Year beginning 5th April 1951 , classified according to the 
Magnitude of their Annual Income . (Fiom the Statistical Abstract for the United 
Kingdom for the Years 1613 and 1016 32, Cmd. 4489.) 


Annual Income 


Number of 

Frequency per 

(4*000) 


Poisons 

£500 Interval. 

2 and not esxeodin 

g 2*5 

23,988 

23,988 

2 5 „ 

3 

15,781 

15,781 

3 

4 

17,079 

8,989 

4 

r> 

9,755 

4,877 

5 ,, ,, 

6 

5,921 

2,960 

6 ,, ,, 

7 

3,729 

1,864 

7 

8 

2,546 

1,273 

8 

10 

3,193 

798 

10 

15 

3,616 

362 

15 

20 

1,328 

133 

20 

25 

679 

68 

,, ,) 

30 

378 

38 

30 

10 

372 

19 

40 

50 

192 

10 

50 

75 

182 

4 

75 

100 

57 

1 

100 and over 


94 

? 

Total number of 

persons J 

89,790 

- 


To make the class-frequencies really comparable Infer se they must first 
be reduced to a common internal as basis, say £500, by dividing the third 
and subsequent numbers by 2, the eighth by 1, and so on. This gives 
the mean frequencies tabulated m the third column of Table 0.5. The 
reduction is, however, impossible in the case of the last class, for we arc 
told only the number of persons with ail income of £100,000 and upwards. 
Such an indefinite class is in many respects a great inconvenience, and 
should always be avoided in work not subjected to the necessary limitations 
of official publications. 

6.12. The general rule that intervals should he equal must not be held 
to bar the analysis by smaller equal intervals of some portion of the range 
over which the frequency varies very rapidly. In Table 6.11, page 100, 
for example, giving the numbers of deaths from scarlet fever at successive 
ages, it is desirable to give the numbers of deaths in each year for the first 
five years, so as to bring out the rapid rise tojhe maximum in the third 
year of life. 







00 THEORY OF STATISTICS. 

Graphical Representation: Frequency-polygon and Histogram. 

6 , 13 . It is often convenient to represent the frequency-distribution 
by means of a diagram which conveys to the eye the general run of the 
observations. The following short table, giving the distribution of head- 
breadths for 1000 men, will serve as an example: — 

Table 0 . 6 . —Showing the Frequency-distribution of IIcad-breadths for Students at 
Cambridge. Measurements taken to the nearest Tenth of an Inch. (Cited from 
W. H. Macdoncll, Biometrika , vol. 1, 1902, p. 220.) 


Head breadth 
in Inches. 

Number of 
Men with said 
Head-breadth. 

Head-breadth 
in Inches. 

Number of 
Men with said 
Head-breadth. 

5*5 

3 

6*3 

99 

5 6 

12 

6*4 

37 

6*7 

43 

6*5 

15 

5*8 

80 

6*6 

12 

5*9 

331 

6*7 

3 

6*0 

236 

6*8 

2 

6*1 

185 


— 

6*2 

142 

Total 

1000 


Taking a piece of squared paper ruled, say, in inches and tenths, mark 
off along a horizontal base-line a scale representing class-intervals ; a 
half-inch to the class-interval would be suitable. Then choose a vertical 
scale for the class-frequencies, say 50 observations per interval to the inch, 
and mark off, on the verticals or ordinates through the points marked 5-5, 
5*6, 5-7 , ... at the centres of the class-intervals on the base-line, heights 
representing on this scale the class-frequencies 3, 12, 43, . . . The diagram 
may then be completed in one of two ways : (1) as a frequency-polygon, 
by joining up the marks on the verticals by straight lines, the last points at 
each end being joined down to the base at the centre of the next class- 
interval (fig. 6.1); or (2) as a column diagram or histogram, short 
horizontals being drawn through the marks on the verticals (fig. 6*2), which 
now form the central axes of a series of rectangles representing the class- 
frequencies. 

6.14. The student should note that in any such diagram, of either 
form, a certain area represents a given number of observations. On the 
scales suggested, 1 inch on the horizontal represents 2 intervals, and 1 inch 
on the vertical represents 50 obser\ations per interval: 1 square inch 
therefore represents 50 x2 -100 observations. The diagrams are, how¬ 
ever, conventional : in both cases the whole area of the figure is pro¬ 
portional to the total number of observations, but the area over every 
interval is not correct in the case of the frequency-polygon, and the 
frequency of every fraction of any interval is not the same, as suggested 
by the histogram. The area shown by the frequency-polygon over any 
interval with an ordinate y 2 (fig. 6.3) is only correct if the tops of the three 
successive ordinates y l9 y 2 ,, t/ 3 lie on a line, i.e . if y 2 « \{y x +y 8 ), the areas of 
the two little triangles shaded in the figure being equal. If y 2 fall short of 
this value, the area shown by tire polygon is too great; if y 2 exceed it, 
the area shown by the polygon is too small; and if, for this reason, the 




FEEQUENCY-DISTRXBtJTIONS. 91 

frequency-polygon tends to become very misleading at any part of the 
range, it is better to use the histogram. 




Fig. 0.2.™ Histogram for the same data as fig. 6.1. 


6.15. The histogram may also be used when the class-intervals are 
unequal. The construction of the previous section is easily adapted to 



92 


THEORY OF STATISTICS# 


such cases. All that is necessary is to describe an area equal, on the scale 
adopted, to the frequency in a particular interval; this is done, as before, 
by erecting at the centre of the interval an ordinate equal in length to 

the total frequency divided by the 
width of the interval. 

An example of this kind of con¬ 
struction is given in tig. 6.11 (Table 
6.11). The frequencies of deaths for 
ages over 5 years are given in 5-yearly 
periods, whereas those for ages under 
5 years are given in 1-yearly periods. 
On the seale indicated, therefore, the 
height of the cell of the histogram cor¬ 
responding to the ages 2-3 years is 
80, the class-frequency; that of the 
cell corresponding to the ages 5 10 is 
42-6, i.e. 213 divided by 5. Hence the 
areas of the two cells are, to the scale 
adopted, 89 and 213, respect i\cly, so that the areas accurately represent 
the frequencies. 



Frequency -curves. 

6.16. If the class-intervals be made smaller, and at the same time 
the number of observations increased so that tin* class-frequencies may 
remain finite, the polygon and the histogram will approach more and 



more closely to a smooth curve. Such an ideal limit to the polygon or 
the histogram is called a frequency-curve. It is a concept of supreme 
importance in statistical theory. 

In the frequency-curve the area between any two ordinates whatever 
is proportional to the number of observations falling between the corre¬ 
sponding values of the variable. Thus, the number of observations 
falling between the values of the variable t T T and in fig. 6.1 will be 
proportional to the area of the shaded strip in the figure ; the number of 




FREQUENCY-DISTRIBUTIONS. 93 

observed values greater than ir 2 will be given by the area of the eurve to 
the right of the ordinate at «r 2 ; and so on. 

6.17. When we come to consider the theory of sampling we shall 
regard the frequency eurve as representing a universe from which the 
actual data are a specimen. The frequency-polygon and the histogram 
will then be approximations to the eurve, but will diverge from it to 
some extent owing to fluctuations of sampling. For the piesent we must 
defer a closer inquiry into this subject. We may remark, however, that 
when the number of obserx ations is considerable—say a thousand at 
least - the run of the class-frequencies is usually sufficiently smooth to 
give a good notion of the form of the 44 ideal ” distribution. 

Some Common Types of Frequency-distribution. 

6.18. The forms presented by smoothly running sets of data are 
almost endless in their variety, but among them we may notice a com¬ 
paratively small number of simple types. Such types also form a set 
into which more complex distributions may often be analysed. For 
elementary purposes it is sufficient to consider four fundamental simple 
types, which we shall call the symmetrical distribution, the moderately 
asymmetrical or skew 7 distribution, 1 the extremely asymmetrical or 
d-shaped distribution and the U-shaped distribution. In the following 
sections w r c give some examples of each of these types, together with a 
few more complex distributions. 

The Symmetrical Distribution. 

6.19. In this type the class-frequencies decrease to zero symmetri¬ 
cally on either side of a central maximum. Fig. 6.5 illustrates the ideal 
form of the distribution. 



Fig. 6.5. —An Ideal Symmetrical Frequency-distribution. 


1 These two types, from their shape, are frequently referred to as “humped,” 
“cocked hat,” “single peaked,” and so on. 



94 THEORY OF STATISTICS* 

Being a special case of the more general type described under the 
second heading, this form of distribution is comparatively rare. It 
occurs in the case of biometric, more especially anthropometric, measure¬ 
ments, from which the following illustration is drawn, and is important 
in much theoretical work. Table 6 V 7 shows the frequency-distribution of 
statures for adult males born in the British Isles, from data published by a 

Table 0.7. —Showing the Frequency-distributions of Statures for Adult Males born in 
England , Scotland , Wales and Ireland. (Final Report of the Anthropometric 
Committee to the British Association.) [Report. 1880. p. 250.) As Measurements 
arc stated to have been taken to the nearest £th of an Inch , the Class-intervals are here 
presumably 5tttg-37fcjj, 57}f,-58t§, and so on (cf. 6.9). (See fig. 0.0.) 


Height without 
shoos, Inches. 

Number of Men within said Limits of Height. 
Place of Birth— 

Total. 

England 

Scotland. 

Wales 

Ireland. 

57- f<-s 

? 1 


1 


2 

58- 

y 3 

1 

— 

— 

4 

59- 

12 

— 

1 

1 

14 

60- 

39 

2 

— 

— 

41 

61- 

70 

2 

9 

2 

83 

62- 

128 

9 

30 

2 

169 

63- 

320 

19 

48 

7 

394 

64- 

524 

47 

83 

15 

669 

65^ 

740 

109 

-308 

33 

990 

6(5- 

881 

139 

-145 

58 

1223 

k67~ 

918 

210 

128 

73 

1329 

68- 

886 

210 

72 

62 

1230 

69- 

753 

218 

52 

40 

1063 

70- 

473 

115 

33 

25 

646 

71- 

254 

102 

21 

15 

392 

72- 

117 

69 

6 

10 

202 

73- 

48 

26 

2 

3 

79 

74- 

18 

15 

1 

— 

32 

75- 

9 

6 

1 

— 

16 

76- 

1 

4 

— 

— 

5 

77- 

1 

1 

— 

— 

2 

Total 

6194 

1304 

741 

846 

8585 


British Association Committee in 1883, the figures being given separately 
for persons born in England, Scotland, Wales and Ireland, and totalled 
in the last column. These frequency-distributions are approximately of 
the symmetrical type. The frequency-polygon for the totals given by 
the last column of the table is shown in fig. 6.6. The student will notice 
that an error of ^ inch, scarcely appreciable in the diagram on its reduced 
scale, is neglected in the scale shown on the base-line, the intervals being 
treated as if they were 57- 58, 58-50, etc. Diagrams should be drawn for 
comparison showing, to a good open scale, the separate distributions for 
England, Scotland, Wales and Ireland. 

The Moderately Asymmetrical (Skew) Distribution. 

6.20. In this case the class-frequencies decrease with markedly 
greater rapidity on one side of the maximum than on the other, as in 



FEKaiTENCY-DISTRTBUTIONS. 


95 


fjg. 6.7 (a) or (b)> This is the most common of all smooth forms of 
frequency-distribution, illustrations occurring in statistics from almost 



O) («) 



Fig. 6.7.- Ideal Distributions of the Moderately Asymmetrical Form. 


every source. The distribution 6f birth-rates given in Table 0.1 is slightly 
asymmetrical. 

The distribution of Australian marriages given in Table 0.8 (fig. 6.8) 
is rather more asymmetrical and is of the type (a) of fig. 0.7. The 
frequency attains its maximum for ages between 24 and 27 and then 
tails off slowly. We have not drawn the tail of the curve, which is very 
close to the 0 -axis, for values of the variate above 58*5. 





96 THEORY OF STATISTICS. 


Table 6.8. —Showing Numbers of Marriages Conlrurled in Australia, 1907 11, arranged 
according to the Age of Bridegroom in 3-Year Group*. (From S. J, Pro tori us, 
“Skew Bivariate Frequency Surfaces,” Biometrtka , vol. 22, 1930-81, p. 210.) (See 
fig. 6.8.) 


Age of Bridegroom 

NYimlu*!* nf 

Age of Bridegroom 

Number of 

(Central Value of 3-Year 
Range, in Ycarb). 

UlilUV I U1 

Murriagob. 

((Vnlial Value of 3-Year 
Range, in Yearn) 

Mai riagos. 

10*5 

294 

55-5 

1,655 

19 5 

10,995 

58 5 

1,100 

22'5 

61,001 

61 5 

SJO 

25-5 

73,051 

64 5 

649 

28-5 

56,501 

67 5 

487 

31-5 

33,478 

70 5 

326 

34 5 

20,569 

73 5 

1 211 

37*5 

14,281 

76 5 

| 119 

40 5 

9,320 

79 5 

73 

43-5 

; (5,236 

82 5 ! 

27 

46-5 

4,770 

85*5 

14 

49*5 

52*5 

3,620 

2,190 

88 5 

1 5 

Total 

301,785 

! 1 



Table (5.9 and iig. (5.9 give a biological illustration, m/. I lie distribution 
of fecundity (ratio of yearling foals produced to coverings) m marcs. 
The student should notice the difficulty of classification m this ease 1 : 
the class-interval chosen throughout the middle of the range is l/15th, 
but the last interval is “2.)/3(>~l.” This is not a whole interval, but it 
is more than a half, for all the cases of complete fecundity arc reckoned 
into the class. In the diagram (fig. (5.9) it has been reckoned as a whole 
class, and this gives a smooth distribution. 

To take an illustration from meteorology, the distribution ofbarometcr 
heights at any one station o\ er a period of time is, in general, asymmetrical, 
the most frequent heights lying towards the upper end of the range for 
stations in England and Wales. Table (5.10 and lig. (5.10 show the dis¬ 
tribution for daily observations at Greenwich during the years 5 818-1920 
inclusive. 

The distributions of Tables 0.8-0.10 all follow more or less the type 
of fig. 0.7 (a), the frequency tailing off, at the steeper end of the distribu¬ 
tion, in such a way as to .suggest that the ideal curve is tangential to the 
base. Cases of greater asymmetry, suggesting au ideal curve that meets 
the base (at one end) at a finite angle, even a right angle, as in tig. 0.7 (/;), 
are less frequent, but occur occasionally. The distribution of deaths 
from scarlet fever, according to age, affords one such example of a more 
asymmetrical kind. The actual figures for this ease are given in 
Table 6.11 and illustrated by fig. 0.11 ; and it will be seen that the 
frequency of deaths reaches a maximum for children aged “2 and under 
3,” the number rising very rapidly to the maximum, and thence falling 
so slowly that there is still an appreciable frequency for persons over 
50 years of age. 

Asymmetrical curves are also said to be “ skew.” In Chapter 9 




FHEQUENCY-MSTltlBUTIONS. 


97 


we shall consider skewness at some length and discuss various ways of 
measuring it. In particular we shall find that skewness has a sign, and 





$8 


THEORY OF STATISTICS* 


left; e.g, the curve of fig. 6,8 has positive skewness, whilst those of figs, 6*0 
and 6.10 have negative skewness. 

Tabu. 6.9 .—Showing the Frequency-distribution of Fecundity , i.e. the Ratio of the Number 
of Yearling Foals Produced to the Number of Coverings , t for Brood-mares (Race¬ 
horses) Covered Eight Times at Least . (Pearson, Lee and Moore, Phil. Trans., A, 
vol. 192, 1899, p. 303.) (See iig. 6.9.) 



Number of 


Number of 


Mares with 


Mares with 

Fecundity. 

Fecundity 

Fecundity. 

Fecundity 


between the 


between the 


Given Limits 


Given Limits. 

1/30- 3/30 

2 

17/30-19/30 

315 

3/30- 5/30 

7 5 

19/30 21/30 

337 

5/30- 7/30 

11 5 

21/30-23/30 

293 5 

7/30- 9/30 

21 5 

23/30-25/30 

201 

0/30-11/30 

55 

25/30-27/30 

127 

11/30 13/30 

104 5 

27/3Q-29/30 

49 

13/30-15/30 

182 

29/30-1 

19 

15/30-17/30 

271 5 


— 



Total 

2000 0 



Fig. 6.9.—Frequency'distubutjon of Fecundity foi Brood-mares. 

(Table 6.9.) 

The Extremely Asymmetrical, or d-shaped, Distribution. 

6.21. In this type the class-frequencies run up to a maximum at one 
end of the range, as in fig. 6.12. 

This may be regarded as a limiting form ol the previous distribution, 
and, in fact, the two cannot always be distinguished by elementary methods 
if the original data are not available. If, for instance, the frequencies of 
Table 6.11 had been given by five-year intervals only, they would have run 
822, 218, 70, 27, etc., thus suggesting that the maximum number of deaths 


Frequency per Fjoth inch interval 


FREQUENCY-DISTRIBUTIONS 


m 


Table 6*10.— Shewing Barometric Heights at Greenwich on Alternate Bays from 1848-1928. 
(Data from S. J. Pretorius, “Skew Bivariate Frequency Surfaces,” Hiometrika , 
vol* 22, 1930-31, p. 154.) (See fig. 0.10.) 


Barometric Height 
(Central Value m 
Inches). 

Number of Days. 

Barometric Height 
(Central Value in 
Inches). 

Number of Days. 

28-35 

1 

29-65 

3176 

28-45 

4 

29-75 

3700 

28-55 

12 

29 85 

3921 

28-65 

43 

29-95 

3749 

28-75 

60 

30 05 

2951 

28-85 

81 

30-15 

1951 

28-95 

189 

30 25 

1148 

29-05 

282 

30-35 

563 

29-15 

542 

30-45 

258 

29 25 

813 

30-55 

73 

29 35 

1233 

30-65 

13 

29-45 

1752 

30-75 

7 

29-55 

2333 

Total 

28,855 



2835 2855 2875 2895 2975 2935 2955 2975 2995 3015 3035 3055 3075 

Barometric height (inches) 

Fig. 6. 10.—Barometric Height at Greenwich on Alternate Days from 
1848-1920. (Table 0.10.) 

occurred at the beginning of life, i.e. that the distribution was d-shaped. 
It is only the analysis of deaths in the earlier years by one-year intervals 
which shows that the frequencies reach a maximum in the third year and 
that therefore the distribution is of the moderately asymmetrical type. 




100 


THEORY OF STATISTICS. 


Tabi»e 0 , 11 . —Showing the Number of Deaths from Scarlet Fever at Different Ages in 
England and Wales in 1933. (Data from Registrar-General's Statistical Review 
of England and Wales for 19311, Tables, Part I, Medical, supplemented by informa¬ 
tion supplied by him in correspondence.) (Sec lig. 0.11.) 


Ago m Years. 

Number of Deaths. 

Number per Year. 

0- 

16 

16 

1- 

69 

69 

2- 

89 

89 

3- 

74 

74 

4- 

74 

74 

5- 

213 

42 6 

10 - 

70 

14*0 

ir>- 

27 

5 4 

20- 

26 

5*2 

25- 

17 

3 1 

30- 

12 

24 

35- 

11 

2 2 

40- 

l 10 

2 0 

45- 

6 

1 2 

50- 

7 

l 4 

55- 

5 

1 0 

60- 


— 

05- 

i 

0 2 

70- 

i 

0 2 1 

75- 

i 

0 2 ! 

80- 

— 

1 

_ I 

Total 

729 

1 

_ _ _ 

___ 

1 


In practical cases no hard-and-fast rule can he drawn between Ihe moder¬ 
ately and extremely asymmetrical types, any more than between the 
asymmetrical and the symmetrical types. 

6.22. In economic statistics this form of distribution is particularly 
characteristic of the distribution of wealth in the population at large, as 
illustrated by income tax and house valuation returns, and the curve to 
which it gives rise has been called the “ Pareto line,” after Vdfredo Pareto, 
who directed the attention of economists to it (vide ref. (99)). The student 
should draw the histogram of the data of Table 0.5 in illustration of this 
point. 

Such distributions may, of course, be a very extreme ease of the last 
type. It is dillicult to say. But if the maximum is not absolutely at the 
lower end of the range, it is very close thereto. 

Official returns do not usually give the necessary analysis of the 
frequencies at the lower end of the range to enable the exact position of the 
maximum to be determined ; and for this reason the data on which Table 
6.12 is founded, though of course very unreliable, are of some interest. It 
will be seen from the table and (ig. 6.13 that with the given elassilieation 
the distribution appears clearly assignable to the present type, the number 
of estates between zero and £100 in annual value being more than six times 
as great as the number between £100 and £200 in annual value, and the 
frequency continuously falling as the value increases. A close analysis of 
the first class suggests, however, that the greatest frequency does not occur 



FREQUENCY -DISTRIBUTIONS, 


101 

actually at zero, but that there is a true maximum frequency for estates of 
about £1 15/- in annual value. The distribution might therefore be more 
correctly assigned to the second type, but the position of the greatest 
frequency indicates a degree of skewness which is high even compared 
with the skewness of fig. 6.11. 

The type is not very frequent in other classes of material, but instances 
occur here and there. Distributions of deaths of centenarians afford an 



Fig. G.ll,— Histogram of Number of Deaths from Scarlet Fever for 
Various A&cs. (Table G.ll.) 

example, and so, curiously enough, do deaths of infants unless the class- 
interval is exceedingly line—a matter of hours. It lias also been shown 
that the distribution may he obtained by compiling the frequencies of the 
numbers of genera with 1, 2, 8, . . . species in any biological group. 
Table 0.18 shows such a distribution for the Chrysomclid beetles. 

The U-shaped Distribution. 

6.23. This type exhibits a maximum frequency at the ends of the 
range and a minimum towards the centre, as in lig. 6.14. 

This is a rare but interesting form of distribution, as it stands in some¬ 
what marked contrast to the preceding forms. Table 6.14 and lig. C.15 



102 THEORY OF STATIRTICS. 

illustrate an example based on a considerable number of observations, viz. 
the distribution of degrees of cloudiness, or estimated percentage of the sky 
covered by cloud, at Greenwich in July. 

For the purposes of the illustration wc regard cloudiness as a variate 
varying from complete overcastness to clear sky, the range being divided 
into eleven equal parts. 

It will be seen that a sky completely or almost completely overcast at 



Fig. 0.12.—An Ideal Distribution of the Extremely Asymmetrical Form. 

the time of observation is the most common, a practically dear sky comes 
next, and the intermediates are more rare. 

The remarks we made about the extreme end of the d-shaped dis¬ 
tribution also apply to the U-shaped distribution. In particular cases it 
may be that the grouping is loo coarse to reveal the irue character of the 
frequency at the maxima, and if the data were more complete we might 
discover that the two arms of the U in fact were bent over. 

Truncated Forms. 

6.24. The four types we have been considering sometimes occur in 
an incomplete form. Certain limitations on the range of the variate may 
result in a kind of truncation at one end or the other. Consider, for 
example, Table 6.15, p. 107. In obtaining these figures, twelve dice were 
thrown and the occurrence of a 6 was called a success. At one throw there 
could thus be any number of successes from 0 to 12. The dice were thrown 
4096 times. 



FREQUENCY-DISTRIBUTIONS. 


103 



Fig. 6.13. -“Frequency-distribution of the Annual Values of certain Estates 
in England in 1715; 2476 Estates, (Table 6.12.) 

Fig. 6.10 gives the frequency-polygon for this distribution. We can 
picture it as a slightly skew distribution which has been cut off on the left 
owing to the inadmissibility of negative values of the variate. Discon¬ 
tinuous variates not infrequently give rise to this effect of truncation. 

Complex Distributions. 

6.25. Table 0.16 gives the number of male deaths within certain age- 
limits for England and Wales in the years 1900 32. 

The histogram for these data is given in lig. 0.17. It will be seen that 
the distribution has three maxima, one for each of the 0-5, the 20-25 and 
the 70-75 age-groups. 

Without looking too closely into this mortality curve we can see 
that the high frequency at the beginning is undoubtedly due to the heavy 
infantile death-rate. We can, if we choose, regard the distribution as 



Number of observations per unit interval 


104 


THEORY OF STATISTICS. 



FKEQUKNCY-DISTKTRUTIONS. 105 

Table 0.12.— Showing the Numbers jund Annual Values of the Estates of those who had 
taken part in the Jacobite Histag of J?16, (Compiled horn Cosin’s k ‘ Names of the 
Homan Catholics , Non jurats, and otheis who Hij used to take the Oaths to his late 
Majesty King George* etc "; London, 1745. Figures of very doubtful absolute value. 
See a note m Southey’s Commonplace Book" vol. 1, p. 573, quoted from the 
Memoirs of T. Hollis.) (See fig. (>.10.) 


Annual 
Value m 
£100. 

Number of 
Estates. 

Annual 
Value in 
£100. 

Number of 
Estates. 

0- 1 

1720-5 

17-18 

1 

1- 2 

280 

— 

— 

2- 3 

140-5 

20-21 

* 

3- 4 

87 

21-22 

1 

4- 5 

40-5 

22-23 

1 

5— 6 

42-5 

23-24 

1 

6- 7 

29-5 

— 

— 

7- 8 

25 5 

27-28 

2 

8- <1 | 

18-5 

— 

— 

9-H) 

21 

31-32 

1 

10-11 

11-5 

__ 

— 

11-12 

9-5 

39-40 

1 

12-13 

4 

— 

— 

13 14 

3-5 

45-46 

1 

14 -15 

8 

— 

— 

15 10 
16-17 

3 

5 

48-49 

1 

Total 

2176 




made up by the superposition of three others : a d-shaped distribution 
for the lower years, a small one-humped distribution with its maximum 
about the period 20-25 years, and a skew distribution for the higher 
ages. This is an example of the fact we have already mentioned, that 
a complex distribution can sometimes be analysed into simpler types. 
In this particular ease the analysis is likely to be of real service in actuarial 
work and in investigations into the causes of death. 

6.26. Finally, we give an example of a pseudo-lrequcncy-distributiou 
of a type occasionally resorted to when the data can be classified according 
1o a characteristic which, though not strictly speaking measurable, can 
nevertheless be graduated in an ordered sequence. Such a east 1 arises 
fairly often in psychological work. 

A list of 100 words was read out to each of 11 subjects. Subsequently, 
at 15-minute intervals, four fresh lists were read out which contained 25 
of the words in the original and 25 new words, the 1 four taken together 
accounting for the whole of the original 100. The subject had to say 
whether these individual words were in the original list or not, and to 
state whether he was certain, fairly sure, doubtful but inclined one way 
or the other, or merely doubtful. The various phases of belief were 
then allotted numbers, and ran from -3 (certainty that a word was not 
in the original) through 0 (doubt, without inclination one way or the other) 
to + 3 (certainty that a word w as in the original). The tabulation on p. 108 
sets out the results for words in the original list (data reproduced by 
permission from the records of the Department of Psychology, University 
of St Andrews). 



106 


THEORY OF STATISTICS 


Table 6.18.—Chrysomelkla* (beetles). Numbers of Genera with 1, 2* 3, . . . Species . 
(Compiled by Dr J. C\ Willis, F.R.S.; cited from G. 17. Yule, “A Mathematical 
Theory of Evolution based on the Conclusions of Dr J. C. Willis,” Phil . Trans ., 
B, vol. 218, 1924, p. 85.) 


— 

— 



-— — — 

™-— 

Species. 

Genera. 

Spec ies. 

Genera. 

Species, 

Genera. 

1 

215 

32 

1 

74 

1 

2 

90 

33 

1 

76 

1 

3 

38 

34 

1 

77 

1 

4 

35 

35 

1 

79 

1 

5 

21 

36 

3 

83 

1 

6 

16 

37 

1 

84 

3 

7 

15 

38 

1 

87 

2 

8 

14 

39 

2 

89 

1 

9 

5 

40 

2 

92 

2 

10 

15 

41 

l 

93 

1 

11 

8 

43 

4 

110 

1 

12 

9 

44 

1 

114 

1 

13 

5 

45 

1 

115 

1 

14 

6 

46 

1 

128 

1 

15 

. 8 

49 

2 

132 

1 

16 

6 

50 

4 

133 

1 

17 

6 

52 

1 

146 

1 

18 

3 

53 

1 

163 

1 

19 

4 

56 

1 

196 

1 

20 

3 

58 

1 1 

217 

1 

21 

4 

59 

1 

227 1 

1 

22 

4 

62 

1 

264 

1 

23 

5 

63 

3 

327 

1 

24 

4 

65 

1 

399 

1 

25 

2 

66 

1 

417 

1 

26 

3 

67 

1 

681 

1 

27 

28 

1 

') 

69 

71 

1 

j 



#) 




29 

3 

72 

1 

Total 

627 

30 

3 

73 

1 




Table 6.14. - Showing the Frequencies of Estimated Intensities of Cloudiness at Greenwich 
durmg the Year s 1890-1904 (excluding 1901) for the Month of July. (Data from 
Gertrude E. Pearse, Biornetrika, a oh 20A, 1928, p. 886.) (See iig. 6.15.) 


— 


~— ~ 

— 

Degrees of 
Cloudiness. 

Frequency. 

Degrees of 
Cloudiness. 

Fiequenoy. 

10 

676 

4 

45 

9 

148 

3 

68 

8 

90 

2 

74 

7 

65 

1 

129 

6 

55 

0 

320 

5 

45 




Total 

1715 




FRKQUENCY-mSTRIBUTIONH. 


107 


Table Twelve Dice thrown 4096 Times , a Throw of 6 Points reckoned as a Success 

(Weldon’s data; cited by F. Y. Edgeworth, Encyclopedia Britunnica, 11th ed. 
vol. 22, j>. 39.) (See fig. 0.1(h) 


Number of Successes , 

0 

1 

2 

0 

4 

5 

0 

7 and ever 

Total. 

Number of Throws 

447 

1145 

1181 

796 

080 

115 

24 

8 

4096 



Fio. 0,10.—Frequency Polygon of Successes with Dice Throwing. (Table 6.15.) 


Table (SMS. - Showing the Number of Male Deaths in England and Wales for 1930- 32, 
classified by Ages at Death. (Data from Registrar-General’s Statistical Review 
of England and Wales, 1900, Text.) (See fig. 0.17.) 


Age at Death 
(years). 

Number of Deaths 

0- 5 

97,290 

5-10 

11,532 

10-15 

7,305 

15-20 

13,062 

20-25 

10,741 

25-00 

10,120 

00-35 

15,673 

35-40 

18,345 

40-45 

23,778 

45-50 

30,158 

50-55 

43,812 


• at Death 
(years). 

Number of Deaths. 

55- 60 

56,609 

60- 65 

68,100 

65- 70 1 

80,690 

70- 75 

84,041 

75- 80 

72,180 

80- 85 

45,094 

85- 90 

19,913 

90- 95 

5,145 

95-J00 

767 

3 and over 

48 

Total 

729,442 






108 


THEORY OF STATISTICS. 


Words in the original list were classified as: 

In Possibly Out. 

. —— N cithei In-'-. 

Certain. Fairly Sure. Doubtful, 01 Out. Doubtful. Fairly Sure. Certain. 

+ 3 +2 fl 0 -1 -2 -3 

540 117 03 39 63 87 191 


These results are very curious, and arc borne out by other data of a 
similar kmd. In particular we see that there were more eases of certainty 
about something which was not true than of doubt without inclination. 



Fio. 6.17. -Histogram of Number of Deaths at Various Ages. (Table 6.16.) 


In this example we are clearly making some assumption m allotting 
numbers to various degrees of belief; but it would be impossible to 
measure belief on a scale, and we have to do the best we can. The numbers 
attached to the variate in such cases arc not measures, but convenient 
ordinals, like the numbers attached to kings of the same name. For 
this reason a frequency diagram of such data can only give a very general 
idea of their true nature. 


SUMMARY. 

1. Data in which the individuals are specified by the numerical values 
of a variable, or variate, may with convenience be arranged in a table 
whieh gives the frequency lying within successive, preferably equal, ranges 
of the variable. Sueh an arrangement is (‘ailed a frequency -distribution. 

2. The frequency-distribution can be represented diagrammatieally by 
means of a frequency-polygon or a histogram. 

3. The histogram is particularly appropriate to eases in whieh the 
frequency changes rapidly or the class-intervals are not all of the same 
width. 

4. As the width of the class-intervals becomes smaller, the frequency- 
polygon or the histogram may be imagined to approach a smooth curve, 
which is called the frequency-curve. 



FREQUENCY-DISTRIBUTIONS. 


109 


5. A large number of frequency distributions occurring in practice 
fall into four types : the symmetrica], the moderately asymmetrical or 
skew, the extremely asymmetrical or d-shaped and the U-shaped types. 
Certain other distributions can be analysed into constituents each of 
which belongs to one of these types. 


EXERCISES. 

0.1. If the diagram 1ig. 6.0 is redrawn to scales of 000 observations per interval 
to the inch and 4 inches of stature to the inch, what is the scale of observations 
to the square inch? 

If the scales are 1(H) observations per interval to the centimetre and 2 inches 
of stature* to the centimetre, what is the seale of observations to the square 
cent i metre? 

6.2. Jf lig. 0.10 is redrawn to scales of 000 days to the inch and 0*0 inch of 
barometric height to the inch, what is the scale of observations to the square 
inch? 

If the scales are 100 days to the cent imetre and 0 1 inch of barometric height 
to the centimetre, what is the scale of observations to the square centimetre? 

0.0. If a frequency-polygon he drawn to represent the data of Table 0.1, 
what number of observations w ill the polygon show between birth-rates of 10*5 
and 17 5 per thousand, instead of the true number 80? 

6.1. If a frequency-polygon he drawn to represent the data of Table 0.0, 
wliat number of observations will the polygon show between head-breadths 
5 95 and 0*05, instead of the true number 200? 

6.5. Draw frequency-polygons or histograms, as the ease seems to require, 
for the following distributions, and assign them to the four types we have 
enumerated in 6.18 

(a) Size of Firms in the Food, Drink and Tobacco Trades of Great Britain. (Final Report 

of the Fourth Census of Production, 19110, Part Ill.) The following table shows 

the number of funis employing on an average certain numbers of persons: — 


Size ot Finn (Aver¬ 

11 24 

25-19 

50 99 

100- 

200- 

300 

400- 

500- 

750- 

1000- 

1500 

Total 

age Numbers Em¬ 
ployed). 




100 

299 

399 

499 

749 

999 

1499 

and o\er 


Number ot Finns . 

224.*)! 

: 

1149 

771 

■m 1 

1 

1(54 

75 

36 

51 

! 

31 

23 

29 

5316 


(b) The Percentages of Deaf-nudes among Children of Parents One of ivhom at least was a 
Deaf-nude, for Marriages producing Five Children or More. (C ompiled trom material 
in ** Man uiges of the Deaf m America ,” ed. E. A. Fav, Volta Bureau, Washington, 
1898.) 


Percentage 

of 

Dcdf-mutuu. 

1 

Number of 

Percentage 
Of | 

Deaf-mutes, i 

1 

Number of 

Families. 

Families. j 

0-20 

220 

60- 80 

5 5 

20-40 

20 5 

80-100 

15 

40-60 

12 

Total 

273 



110 


THEORY OF STATISTICS. 


(t) "Yield of Grain in pounds from Plots of 1 , 1*1 h Acre in a Wheat Field. (Mercer ami 
Hall, 44 The Experimental Error of Field Trials,’' Journ. Agr, Science , vol. 4,1911, 
p. 107.) 


Yield of Grain in pounds 
per*4<»th Acre. (On- 
tral value of range.) 

2-8 

30 

3 2 

3*4 

3-6 

3-8 

40 

4-2 

4-4 

4*6 

48 

50 

5*2 

Total 

Number of Plots, 

4 

15 

20 

47 

63 

78 

88 

69 

59 

35 

10 

8 

4 

500 


(d) The Frequencies of Different Numbers of Petals for Three Series of Ranunculus 
bulbosus. (11. de Vries, Rer . deutsch. bot. (les., Bd. 12, 1894, q.v. for details.) 


Number 
of Petals. 


Frequency. 



Series A. 

Senes B. 

Series (\ 

5 

312 

345 

133 

6 

17 

24 

5 ft 

7 

4 

7 

23 

8 

2 

— 

7 

9 

2 

2 

2 

10 

— 

— 

2 

11 

— 

9 

1 

: 

Total 

337 

; 

380 

222 


6.6. A number of perfectly spherical balls, all of the same material, gi\e a 
symmetrical distribution when classified according to their diameters. Show that, 
if they are classified according to their weights, their frequency-distribution will 
be positively skew towards the higher weights. 

In the light of this result compare the distributions of Table 6.7 with the 
distributions of the table on p. 111. 

6.7. Toss a coin six times and note the number of heads. Repeat the 
experiment 100 times or more, and draw a frequency-polygon of your results 
classified according to the number of heads at each throw. 

6.8. Find the frequency-distribution of 200 bars of a waltz by Strauss classified 
according to the number of notes in the treble clef of each bar, and compare it 
with a similar distribution from modern waltzes. 

6.9. Examine qualitatively the effect on the distribution of Table 6.8 of an 
allowance for the fact that minors tend to overstate their age when marrying. 

6.10. The distribution of a herd of cows classified according to the quantity 
of milk produced by each cow per week is symmetrical. The distribution of the 
same herd classified according to the amount of butter-fat produced by each cow 
per week is negatively skew towards the lower quantities. Suggest a possible 
explanation for this fact. 



FREQUENCY-MSTRIBUTIONS 


111 


The Frequency-distribution of IV eights for Adult Males born in England, Scotland , IV ales and 
Ireland . (hoc. cit., Table 6 7.) Weights were taken to the nearest pound, consequently 
the true Class-intervals are 89'5-99-S, 99-5-109 5, etc. 


; 

' 

Weight 
in lbs. 

Numbe 

V 

England. 

r of Men wit 
height. Pla 

Scotland. 

dim given L 
ioo of Birth- 

Wales. 

units of 

Ireland. 

Total. 

90- 

2 




2 

100- 

26 

1 

2 

5 

34 

110- 

133 

8 

10 

1 

152 

120- 

338 

22 

23 

1 7 

390 

130- 

694 

63 

68 

42 

867 

140- 

1240 

173 

153 

57 

1623 

150- 

1075 

255 

178 

51 j 

1559 

160- 

881 

275 

134 

36 

1326 

170- 

492 

168 

102 

25 

787 

180- 

304 

125 

34 

13 

476 

5 *90- 

r 171 

67 

14 

8 

263 

200- 

75 

24 

7 

l 

107 

210- 

62 

14 

8 

1 

85 

220- 

33 

7 

1 

— 

41 

230- 

10 

4 

2 

— 

16 

240- 

9 

o 

— 

— 

11 

250- 

3 

4 

1 

“ 

8 

260- 

1 

— 

— 

— 

1 

270- 

— 

— 

— 

— 

— 

280- 


— 

1 

__j 


1 

1 _ 

Total 

i 

5552 

1212 

738 J 

247 

7749 




CHAPTER 7 . 

AVERAGES AND OTHER MEASURES OF LOCATION. 


The Principal Characteristics of Frequency-distributions. 

7.1. The condensation of data into a frequency-distribution is a (irst 
and necessary step in rendering a long scries of observations compre¬ 
hensible. But for practical purposes it is not enough, particularly when 
wc want to compare two or more different series. As a next step we wish 
to be able to define quantitatively the characteristics of a frequency- 
distribution in as few numbers as possible. 

7.2. It might seem at first sight that very dillieult cases of comparison 
of two distributions could arise in which, for example, wc had to contrast 
a symmetrical distribution witli a d-shaped distribution. In practice, 
however, we rarely have to deal with such a ease. Distributions drawn 
from similar material are usually of similar form—as, for instance, when 
we wish to compare the distributions of stature* in two races of man, or 
the birth-rates in English registration districts m two successive decades, 
or the numbers of wealthy people in two different countries. The practical 
use of the various statistical quantities which we shall discuss in this 
and the next two chapters is based on this fact. 

7.3. There arc two fundamental characteristics in which similar 
frequency-distributions may differ : 

(1) They may differ markedly in position, i.e. in the value of the 
variate round which they centre, as m tig. 7.1. A. 

(2) They may differ m the extent to which the observations are dis¬ 
persed about the central value. Figs. 7.1, B and C, show cases in which 
distributions differ in dispersion only, and in both dispersion and position, 
respectively. 

To these two characteristics wo may add a third group of less import¬ 
ance, comprising differences in skewness, peakedness, and so on. 

Measures of the first character, i.e. position or location, are generally 
known as averages. Measures of the second are termed measures of 
dispersion. Measures of the properties in the third group have each 
their appropriate name, which we shall give when we come to consider 
them in detail. 

The present chapter deals only with averages. Chapter 8 deals witli 
measures of dispersion, whilst Chapter 9 deals with the remaining 
quantities. 

Dimensions of an Average. 

7.4. In whatever way an average is defined, it may In* as well to 
note it is merely a certain value of tin* variable, and is therefore* neces¬ 
sarily of the same dimensions as the variable: i.e. if the variable be a 

112 



AVERAGES AND OTHER MEASURES OF LOCATION. 113 

length, its average is a length ; if the variable be a percentage, its average 
is a percentage; and so on. Hut there are several different ways of 
approximately defining the position of a frequency-distribution—that is, 


m /o» 




O 


Fig. 7.1. 

there are several different loims of average, and the question therefore 
arises, By what ciiterui arc we to judge the lelativc merits of different 
forms ? What are, in fact, the desirable properties for an average to 
possess ? 

Desiderata for a Satisfactory Average. 

7.5. (a) In the first place, it almost goes without saying that an 

average should be rigidly defined, and not k ft to the mere estimation 
of the observer. An average that was merely estimated w r ould depend 
too largely on the observer as well as the data. 

(b) An average should be based on all the observations made. If not, 
it is not really a characteristic of the whole distribution. 

(e) It is desirable that the average should possess some simple and 
obvious properties to render its general nature readily comprehensible : 
an average should not be of too abstract a mathematical character. 

(d) It is, of course, desirable that an average should be calculated 
with reasonable ease and rapidity. Other things being equal, the easier 
calculated is the better of two forms of average. At the same time 
great weight must not be attached to mere ease of calculation, to the 
neglect of other factors. 

(e) It is desirable that the average should be as little affected as may 
be possible by what wc have termed fluctuations of sampling. If different 
samples be drawn from the same material, however carefully they may 
be taken, the averages of the different samples will lardy be quite the 
same, but one form of average may show much greater differences than 
another. Of the two forms, the more stable is the better. The full 
discussion of this condition must, however, be postponed to a later section 
of this work (Chap. 20). 


8 



114 


THEOBY OF STATISTICS. 


(/) Finally, by far the most important desideratum is this, that the 
measure chosen shall lend itself readily to algebraical treatment. If, 
e.g two or more series of observations on similar material are given, 
the average of the combined series should be readily expressed in terms 
of the averages of the component series ; if a variable may be expressed 
as the sum of two or more others, the average of the whole should be 
readily expressed in terms of the averages of its parts. A measure for 
which simple relations of this kind cannot be readily determined is likely 
to prove of somewhat limited application. 

7.6. There are three forms of average in common use, the arithmetic 
mean, the median and the mode, the first named being by far the 
most widely used in general statistical work. To these may be added 
the geometric mean and the harmonic mean, more rarely used, but 
of service in special eases. We will consider these in the order named. 

The Arithmetic Mean. 

7.7. The arithmetic mean of a series of values of a variable 
X l9 X 2 , A 3 , . . . Xn> N in number, is tlie quotient of the sum of the 
values by their number. That is to say, if 71/ be the arithmetic mean, 

M-lfiXi . . . 1 A\0 

The arithmetic mean is also denoted b) placing a bar over the variate 
symbol, so that we may also write: 

A . . . + X/t) 

To express these formula: more briefly by the use of the summation 
symbol S, 

X^M--^(X) .... (7.1) 

The word mean or average alone, without qualification, is very generally 
used to denote this particular form of average ; that is 1 o say, when anyone 
speaks of “ the mean ” or “ the average ” of a series of observations, it may, 
as a rule, be assumed that the arithmetic mean is meant. 

7.8. It is evident that the arithmetic mean fulfils the conditions laid 
down in (a) and ( b ) of 7.5, for it is rigidly defined and based on all the 
observations made. Further, it fulfils condition (r), for its general nature 
is readily comprehensible. If the wages-bill for N workmen is £P, the 
arithmetic mean wage, P/7V pounds, is the amount that each would 
receive if the whole sum available were divided equally between them; 
conversely, if wc are told that the mean wage is £71/, we know this means 
that the wages-bill is NM pounds. Similarly, if TV families possess a total 
of C children, the mean number of children per family is C/N-^ihe number 
that eV;h family would possess if the children were shared uuiformly. 
Conversely, if the mean number of children per family is M 9 the total 
number of children in N families is NM. The arithmetic mean expresses, 
in fact, a sijfcple relation between the whole and its parts. 

The mean is also satisfactory as regards conditions (c) and (/), but wc 
shall have to defer proof of this statement for the present. 



AVERAGES AND OTHER MEASURES OF LOCATION. 115 


Calculation of the Arithmetic Mean. 

7.9. As regards condition ( d ), simplicity of calculation, the mean takes 
a high place. In the cases just cited, it will be noted that the mean is 
actually determined without even the necessity of determining or noting 
all the individual values of the variable : to get the mean wage we n^cd not 
know the wages of every hand, but only the wages-bill; to get the mean 
number of children per family we need not know the number in each 
family, but only the total. If this total is not given, but we have to deal 
with a moderate number of observations— so lew (say 30 or 10) that it is 
hardly worth while compiling the frequency-distribution—the arithmetic 
mean is calculated directly as suggested by the definition, i.e. all the values 
observed are added together and the total divided by the number of 
observations. 

7.10. But if the number of observations be large, the process of 
adding together all the values of the* variate may be prohibitively lengthy. 
It may be shortened considerably bv forming the frequency-table and treat¬ 
ing all the values in each class as li thev were identical with the mid-value 
of the class-interval, a process winch in general gives an approxima¬ 
tion that is quite sufhcicntly exact lor practical purposes if the elass- 
mtcrval has been taken moderately small.. In this process each class- 
frequency is multiplied by the i riid-valne of the interval, the products 
added together, and the total divided by the number of observations. If 
f denote the frequency of any class, A the mid-value of the corresponding 
elass-mterval, the value of the mean so obtained may be written: 

.... (7.2) 

7.11. But this procedure is still furl her abbreviated m practice by 
the following artifices: (1) The class-mtcival is treated as the unit of 
measurement throughout the arithmetic; (2) the difference between the 
mean and the mid-value of some arbitrallly chosen elass-mterval is com¬ 
puted instead of the absolute value of the mean. 

If A be the arbitrarily chosen value and 

X = A±£ .(7.3) 

then 

S(/A) -SCM) *S(/4) 

or, since A is a constant, 

M-aA n S(/0 .... (7.4) 

The calculation of S( fX) is therefoic leplaeed by the calculation of 
S(/£). The advantage of this is that the class-frequencies need only be 
multiplied by small integral numbers ; for A being the mid-value of a 
class-interval, and A the nud-valuc of another, and the class-interval being 
treated as a unit, the £’s must be a series of integers proceeding from zero 
at the arbitrary origin A. To keep the values of £ as small as possible, A 
should be chosen near the middle of the range. 

It may be mentioned here that ~S(|), or for the grouped 



THEORY OF STATISTICS. 


in 

distribution* is sometimes termed the first moment of the distribution 
about the arbitrary origin A . 

jt Example 7J. —As an example, let us find the arithmetic mean of the 
weights in the distribution of Table 6.7. In this ease the class-interval is 
a unit (1 inch), so the value of M - A is given directly by dividing S(/£) 
by 2V. The student must notice that, measures having been made to the 
nearest eighth of an inch, the mid-values of the intervals are 57^, 58^, 
etc., and not 57-5, 58*5, etc. 

Calcula i ion of thl Mi. an: Calculation of the Arithmetic Mean Stature of Male 
Adults in the British Isles from the Figures of Table 6*.7, p. Vi. 


(1) 

Height, 

Inches. 

(2) 

Frequency 

(S) 

Deviation 
from Arbitrary 
Value A 

«) 

Product 

A- 

57- 

2 

-10 

20 

58- 

4 

- 9 

36 , 

59- 

14 

- 8 

112 < 

60- 

41 

- 7 

287 

61- 

83- 

- 6 

498 . 

62- 

169 

- 5 

845 1 

63- 

394 

- 4 

1576 

64- 

669 

- 3 

2007 

65- 

990 

- 2 

1980 i 

-66- 

1223 

: 

- 1 

1223 

6.7- 

1329 

0 

-8584 / 

68- 

1280 

4- 1 

1230 

69- 

1063 

4- 2 

2126 

70- 

646 

4- 3 

1938 

71- 

392 

4- 4 

1568 

72- 

202 

4- 5 

1010 * 

73- 

79 

4- 6 

474 

74- 

32 

4- 7 

224 

75- 

16 

4- 8 

128 

76- 

5 

4 9 

45 

77- 

2 

4 10 

20 ' 

Total 

8585 

- 

1 4-8763 


S(ft)~ -} 870*5 — 8584 - 4 170 
179 

M A s- + - _, = ) 0 02 class-intervals or inches. ^ 

8585 

M = 07 t V +0*02 - 6746 inches. 

7.12. As calculations of the mean constantly have to be made, the 
student should familiarise himself with the process we have just illustrated, 
and note that a cheek can always be effected on the arithmetic in the 
following way :— 

Since /(£ +1) =/f f/ 

«{/(£ + ])}- sob fS(/•) 

s{/(£+i)}-s(y£)=-S(/) 

— Total frequency 




AVERAGES AND OTHER MEASURES OF LOCATION. 117 

Hence, if we tabulate the values of/(£ -t 1 ) as well as those of and find their 
totals, the difference must, if the arithmetic is correct, be equal to the total 
frequency. 

7.13. It will be evident that a classification by unequal intervals is, 
at best, a hindrance in the calculation of the mean, and the use of an 
indefinite interval at the end of the distribution renders exact calculation 
impossible. The following example illustrates the calculation for unequal 
class-intervals and the arithmetical check to which we have just referred. 
y^xample 7.2.— Data from Table 0 . 11 , page 100 . What is the average 
kge at death from scarlet fever ? 

Here there is a change of the class-interval at the five-year point. We 
take a year to be the unit, and the centre of the interval 5 -10 years as an 
arbitrary origin, which means that A - 7*5 years. 


Calculation of the Mean : Calculation of the Arithmetic Mean Age of Person s' Dying 
from Scarlet Fever in the United Kingdom in 19 iS (Table 6.11 , p. 100) 


Ago, 

Frequency, 

Deviation from A , 



Years 

/• 

s- 

/?• 

ns 11). 

0- 

16 

-7 

112 

- 96 

l- 

69 

6 

414 

- 345 

2- 

89 

- r > 

- 415 

- 356 

3- 

74 

- 1 

- 296 

- 222 

4- 

74 

-3 

- 222 

- 148 

5- 

213 

0 

1489 

-1167 





213 

1(4- 

70 

5 

.350 

420 

15- 

27 

10 

270 

297 

20- 

26 

15 

390 i 

416 

25- 

17 

20 

310 j 

357 

30- 

12 

25 

300 

312 

35- 

11 

30 

330 

341 

40- 

10 

35 

350 

360 

45- 

6 

40 I 

1 240 | 

246 

50- 

7 

15 

317 1 

322 

55- 

5 

50 

250 

255 

60- 

— 

55 

— 

— 

65- 

1 

60 

60 

61 

70- 

1 

65 

65 

60 

75- 

1 

70 

70 

__ 

71 

Total 

729 

1 

1 

f 3330 

J 

t 3737 


Hence, 

S(/£) - 3880 -1489 = 1841 

and 

S{/({ 4 1 )} =3737 - 1107 “2570 

and the difference 2570 -1841 =720, as it should. 

Hence, 

M - A- If ^ 1 = 2525 years 
729 


and 


7-5 +2-525 = 10 025 years 






118 


THE OK Y OF STATISTICS. 


7.14. We return again below, in 7.16 (e), to the question of the errors 
caused by the assumption that all values within the same interval may be 
treated as approximately the mid-value of the interval. It is sufficient to 
say here that the error is in general very small and of uncertain sign for a 
distribution of the symmetrical or oniy moderately asymmetrical type, 
provided, of course, the class-interval is not large. In the case of the 
“d-shaped ” or extremely asymmetrical distribution, however, the error is 
evidently of definite sign, for in all the* intervals the frequency is piled up 
at the limit lying towards the greatest frequency, i.e. the lower end of the 
range in the ease of the illustrations given in Chapter 6, and is not evenly 
distributed over the interval. In distributions of such a type the intervals 
must be made very small indeed to secure an approximately accurate value 
for the mean. The student should test for himself the effect of different 
groupings in two or three different eases, so as to get some idea of the degree 
of inaccuracy to be expected. 

7.15. If a diagram has been drawn representing the frequency- 
distribution, the position of the mean may conveniently be indicated by a 



Mo MiM 

Fig. 7.2, —Mean M, Median Mi and Mode Mo ol the Ideal Moderately 
Asymmetrical Distribution. 


vertical through the corresponding point on the base. In a moderately 
asymmetrical distribution the mean lies on the side of the greatest frequency 
towards the longer “tail” of the distribution : M in fig. 7.2 shows the 
position of the mean in an ideal distribution. In a symmetrical distribu¬ 
tion the mean coincides with tlie centre of symmetry. The student should 
mark the position of the mean in the diagram of every frequency-dis¬ 
tribution that he draws, and so accustom himself to thinking of the mean 
not as an abstraction, but always in relation to the frequency-distribution 
of the variable concerned. 

Properties of the Arithmetic Mean. 

7.16. The following are important properties of the arithmetic mean, 
and the examples illustrate the facility of its algebraic treatment : -- 

(a) The sum of the deviations from the mean, taken with their proper 
signs, is zero. 

This follows at once from equation (7.4) : for if M and A are identical, 
evidently S(f£) must be zero. 



AVERAGES AND OTHER MEASURES OF LOCATION. 119 

(6) If a series of N observations of a variable X consist of, say, two 
component series, the mean of the whole series can be readily expressed 
in terms of the means of t he two components. For if we denote the values 
in the first series by X x and in the second series by X 2 , 

that is, if there be N l observations in the first series and N 2 in the second, 
and the means of the two series be M v J/ 2 , respectively, 

NM=N 1 M 1 hN 2 M 2 .... (7.5) 

For example, we find from ihe data of Table 6.7, 

Mean stature of the 846 men born in Ireland—67*78 inches ^ 

„ „ „ 741 „ „ Wales -66*62 

Hence the mean stature of the 1087 men born in the iwo countries is given 
by the equation 

1087M-(816 x 67*78) r(71d x 66*62) 
that is, M 66*90 inches. * 

Jt is evident that the form of the relation (7.5) is quite general: 
if there are r series of observations A'i, A 2 , . . . A r ,, the mean M of the 
whole series is related to the means M v J/ 2 , . . . M r of the component 
senes by the equation 

NM~N l M x -\N*\U+ . . . \N r M r . . (7.6) 

For the convenient cheeking of arithmetic, it is useful to note that, if the 
same arbitrary origin A for the deviations £ be taken m each case, we must 
have, denoting the component series by the subscripts 1, 2, ... r as 
before, 

s(/£)-s(/;£ 1 ) + s(/ i £ 1 ) t-. . . t-s(/ r £ r ) . . ( 7 . 7 ) 

The agreement of these totals accordingly cheeks the work. 

As an important corollary to the general relation (7.6), it may be noted 
that the approximate value for the mean obtained from any frequency- 
distribution is the same whether we assume (1) that all the values in any 
class are identical with the mid-value of the class-interval, or (2) that the 
mean of the values in the class is identical with the mid-value of the class- 
interval. 

(c) The mean of all the sums or differences of corresponding observa¬ 
tions in two series (of equal numbers of observations) is equal to the sum 
or difference of the means of the two series. 

This follows almost at once. For if 

X -= X x m 2 

S(A> S(A\) iS(X 2 ) 

That is, if M, M lt M 2 be the respective means, 

M = 7t/j 4- M 2 . 


. (7.8) 



120 THEORY OF STATISTICS. 

Evidently the form of this result is again quite general, so that if 
X~X x I X t t . . . iX r 

M^M y ±M 2 I ... 1 Mr . . . (7.9) 

As a useful illustration of equation ( 7 . 8 ), consider the ease of measurements 
of any kind that are subject (as indeed all measures must be) to greater or 
less errors. The actual measurement X in any such ease is the algebraic 
sum of tlie true measurement X 1 and an error X 2 . The mean of the actual 
measurements M is therefore the sum of the true mean M lf and the 
arithmetic mean of the errors A/ 2 . If, and only if, the latter be zero, will 
the observed mean bo identical with the true mean. Errors of grouping 
(7.14) are a ease in point. 

The Median. 

7.17. The median may be defined as the middlemost or central value 
of the variable when the values are ranged in order of magnitude, or as the 
value such that greater and smaller values occur wit h equal frequency. In 
the case of a frequency-curve, the median may be defined as that value of 
the variable the vertical through which divides the area of the curve into 
two equal parts, as the vertical through Mi m tig. 7.2. 

The median, like the mean, fulfils the conditions (b) and (e) of 7.5, 
seeing that it is based on all the observations made, and that it possesses 
the simple property of being the central or middlemost value, so that its 
nature is obvious. 

7.18. But the definition does not necessarily lead in all cases to a 
determinate value. If there be an odd number of different values of X 
observed, say 2 // -4 1 , the (n +l)lh in order of magnitude is the only value 
fulfilling the definition. But if there he an even number, say 2 n different 
values, any value between the ?/th and (n +1 )th fulfils the conditions. In 
such a case it appears to be usual to take the mean of the nth and (n +1 )th 
values as the median, but this is a convention supplementary to the 
definition. 

7.19. It should also be noted that in the ease of a discontinuous 
variable the second form of the definition in general breaks down : if we 
range the values in order there is always a middlemost value (provided the 
number of observations be odd), but there is not, as a rule, any value such 
that greater and less values occur with equal frequency. Thus, in Table 
6.2 we see that 45 per cent, of the poppy capsules had 12 or fewer stigrnatie 
rays, 55 per cent, had 13 or more ,* similarly, 61 percent, had 13 or fewer 
rays, 39 per cent, had It or more. There is no number of rays such that 
the frequencies in excess and defect are equal. In the case of the butter¬ 
cups of Exercise 6.5 (d), page 110 , there is no number of petals that even 
remotely fulfils the required condition. An analogous difficulty may arise, 
it may be remarked, even in the ease of an odd number of observations of a 
continuous variable if the number of observations be small and several of 
the observed values identical. 

The median is therefore a form of average of most uncertain meaning in 
cases of strictly discontinuous variation, for it may be exceeded by 5 , 10, 
15 or 20 per cent, only of the observed values, instead of by 50 per cent, : 
its use in such cases is to be deprecated, and is perhaps best avoided in any 



AVERAGES AND OTHER MEASURES OF LOCATION. 121 

case, whether the variation be continuous or discontinuous, in which small 
scries of observations have to be dealt with. 

Determination of the Median. 

7.20. When all the valium of the variate are given and the total 
frequency is small, the median can be determined by inspection as the 
middlemost value or, if there is no such \aluc, as the mean of the two 
middlemost values. When the distribution is given as a frequency-dis¬ 
tribution, however, a certain amount of approximation is necessary, as in 
the ease of the calculation of (he mean. 

For the frequency-distribution of a continuous variable a sufficiently 
approximate value of the median can be obtained by interpolation. If 
the total frequency is large it is sufficient to assume that the values in each 
class are uniformly distributed throughout the interval. 

Example 7,3,- Let us determine the median of the distribution whose 
mean we found in Example 7.1. The work may be indicated thus : 

Half the total number of observations (8585) . 1202*5 

Total frequency under 66 j / indies . . . -8589 


Difference 

Frequency in next interval 


Hence we take the median to be : 


= 67*1-7 inches 


- 708*5 

— 1820 


The difference between the median and mean in this ease is therefore 
only about one-lumdredth of an inch. 

Example 7A. To find the median of the distribution of Example 7.2. 


Half the total number of observations . 801*5 

Total frequency under 5 years . . . 822 


Difference . . . . . . = 42*5 

Frequency in next interval . . . 218 

Hence we take the median to be : 


“ 6 years 

Here the median is very far from coinciding with the mean. 

Graphical Determination of the Median. 

7.21. Graphical interpolation may, if desired, be substituted for 
arithmetical interpolation. Taking the figures of Example 7.1, we see 
that the number of men with height less than 65]$ is 2366, less than 
66] J is 3589, less than 67}$ is 4918, and less than 68JJ is 6148. 

Plot the numbers of men with height not exceeding each value of X 



122 


THEORY OF STATISTICS. 


to the corresponding value of X on squared paper, to a good large scale, 
as in fig. 7.3, and draw a smooth curve through the points thus obtained, 
preferably with the aid of one of the “ curves,” splines or flexible curves 
sold by instrument-makers for the purpose. The point at which the 
smooth curve so obtained cuts the horizontal line corresponding to a 



Fig. 7.3.--Determination of the Median by Graphical Interpolation. 


total frequency Nj‘2 -4292*5 gives the median. In general the curve is 
so flat that the value obtained by this graphical method does not differ 
appreciably from that calculated arithmetically (the arithmetical process 
assuming that the curve is a straight line between the points on either 
side of the median); if the curvature is considerable, the graphical value 
—assuming, of course, careful and accurate draughtsmanship~-is to he 
preferred to the arithmetical value, as it does not involve the crude 
assumption that the frequency is uniformly distributed over the interval 
in which the median lies. 

Comparison of the Mean and the Median. 

7.22. If we adopt the convention that the median of an even number 
of observations is midway between the two central values, both the 



AVERAGES AND OTHER MEASURES OF LOCATION. 123 

mean and the median satisfy the first three of the desiderata we enumerated 
in 7.5; that is to say, they are rigidly defined, based on all the observa¬ 
tions, and are readily eomprehensible. In the remaining three, however, 
they differ eonsiderably. 

7.23. As regards ease of calculation, the median has distinct advan¬ 
tages over the mean. 

Whether the stability of the median under fluctuations of sampling 
is greater than that of the mean depends to some extent on the 
form of the distribution which is being sampled. In general, the mean 
is the more stable, but cases occur in which the median is preferable 
(cf. 7.24 (d) below, and Chap. 20). 

When, however, the ease of algebraical treatment of the two forms 
of average is compared, the superiority lies w holly on the side of the mean. 
As was shown in 7.16, when several series of observations are combined 
into a single series, the mean of the resultant distribution can be simply 
expressed in terms of the means of the components. Expression of 
the median of the resultant distribution in terms of the medians of the 
components is, however, not merely complex and diilicult, but usually 
impossible : the value of the resultant median depends on the forms of the 
component distributions, and not on their medians alone. If two sym¬ 
metrical distributions of the same form and with the same numbers of 
observations, but witli different medians, be combined, the resultant median 
must evidently (from symmetry) coincide with the resultant mean, i.e. lie 
half-way between the means of the components, llut if the two com¬ 
ponents be asymmetrical, or (whatever their form) if the degrees of 
dispersion or numbers of observations in the two series be different, the 
resultant median will not coincide with the resultant mean, nor with 
any other simply assignable value. It is impossible, therefore, to give 
any theorem for medians analogous to equations (7.5) and (7.6) for 
means. It is equally impossible to give any theorem analogous to 
equations (7.8) and (7.9) of 7.16. The median of the sum or difference 
of pairs of corresponding observations in two series is not, m genera), 
equal to the sum or difference of the medians of the tw r o senes ; the 
median value of a measurement subject to error is not necessarily identical 
with the true median, even if the median error be zero, i.e. if positive 
and negative errors be equally frequent. 

7.24. These limitations render the applications of the median in 
any w'ork m which theoretical considerations are necessary comparatively 
circumscribed. On the other hand, the median may have an advantage 
over the mean for special reasons. 

(a) It is very readily calculated ; a factor to which, however, as 
already stated, too much weight ought not to be attached. 

(b) It is readily obtained, without the necessity of measuring all the 
objects to be observed, in any ease in winch the objects can be arranged 
in order of magnitude. If, for instance, a number of men be ranked in 
order of stature, the stature of the middlemost is the median, and lie 
alone need be measured. (On the other hand, it is useless m the cases 
cited at the end of 7.8; the median wage cannot be found from the 
total of the wages-bill, and the total of the wages-bill is not known when 
the median is given.) 

(v) It is sometimes useful as a makeshift, when the observations are 



124 THEORY OF STATISTICS, 

so given that the calculation of the mean is impossible, owing, e,g. t to a 
final indefinite class. 

(d) The median may sometimes be preferable to the mean, owing to 
its being less affected by abnormally large or small values of the variable. 
The stature of a giant would have no more influence on the median 
stature of a number of men than the stature of any other man whose 
height is only just greater than the median. If a number of men enjoy 
incomes closely clustering round a median of £500 a year, the median 
will be no more affected by the addition to the group of a man with an 
income of £50,000 Ilian by the addition of a man with an income of £5000, 
or even £000. If observations of any kind are liable to present occasional 
greatly outlying values of this sort (whether real, or due to errors or 
blunders), the median will be more stable and less affected by fluctuations 
of sampling than the arithmetic mean (<f. (’hap. 20). 

(e) It may be added that the median is, in a certain sense, a particu¬ 
larly real and natural form of axerage, for the object or individual that 
is the median object or individual on any one system of measuring the 
character with which wc are concerned will remain the median on any 
other method of measurement which leaves the objects in the same relative 
order. Thus a batch of eggs representing eggs of the median price, 
when prices are reckoned at so much per dozen, will remain a batch 
representing the median price when prices are reckoned at so many eggs 
to the shilling. 

The Mode. 

7.25. The mode is the value of the variable corresponding to the 

maximum of the ideal cur\e which gives the closest possible tit to the 
actual distribution. It represents the value which is most frequent or 
typical, the value which is, in fact, the fashion (la mode). 1 Tin' mode 
is sometimes denoted by writing the sign - oxer the variate symbol, e.g. 
X means the mode of the values X v A' 2 , . . . X#. J 

There is evidently something anticipatory about this definition, for 
we have not yet defined what we mean by “ closest possible tit.” For 
the present the student must content himself with intuitive ideas on this 
head. Nor have we given a method of finding the curve of closest fit, 
which would be a necessary preliminary to ascertaining the mode. 

7.26. It is, in fact, diflioult to determine the mode for such distribu¬ 
tions as arise in practice, particularly by elementary methods. It is no 
use giving merely the mid-value of the class-interval into which the 
greatest frequency falls, for this is entirely dependent on the choice of 
the scale of class-intervals. It is no use making the class-intervals very 
small to avoid error on that account, for the class-frequencies will then 
become small and the distribution irregular. What we want to arrive 
at is the mid-value of the interval for which the frequency would be a 
maximum, if the intervals could be made indefinitely small, and at the 
same time the number of observations be so increased that the class- 

1 Unless we state expressly to the contrary, we shall he thinking of single-humped 
distributions in talking of “the” inode. When the distribution is of the complicated 
form of fig. 6.17 there may be more than one mode. Such distributions are therefore 
sometimes called multimodal. The mean and the median are still unique for such 
distributions. 



averages AND other measures of LOCATION. 125 

frequencies should run smoothly. As the observations cannot, in a 
practical case, be indefinitely increased, it is evident that some process 
of smoothing out the irregularities that occur in the actual distribution 
must be adopted, in order to ascertain the approximate \aiue of the mode. 
But there is only one smoothing process that is really satisfactory, in so 
far as every observation can be taken into account in the determination, 
and that is the method of fitting an ideal frequeney-eun c of given equation 
to the actual figures. The value of the variable corresponding to the 
maximum of the fitted curve is then taken as the mode, m accordance 
with our definition. The determination of the mode by this- the only 
strictly satisfactory method must, however, be left to the more advanced 
student. The methods of < urve-fitting whic h we shall discuss in Chapter 17 
are not appropriate to the fitting of frequency-curves, but wc give an 
approximate method which is of use m certain eases m 24.21. 

Empirical Relation between Mean, Median and Mode. 

7.27. For a symmetrical distribution, mean, median and mode 
coincide, as wull be evident on a little consideration. For other distribu¬ 
tions, as a rule, they do not. Fig. 7.2 shows the position of the three 
m a moderately skew distribution. 

There is an approximate i elation between mean, median and mode 
which appears to hold good with surprising closeness for moderately 
asymme trical distributions, approaching the ideal tv pc of tig. 6.7, and it 
is one that should be borne in mind as giv mg - roughly, at all events-- 
the iclative values of these three a\crages for a great many cases with 
whie*h the student will have to deal. It is expressed by the equation 

Mode = Mean -3(Mean - Median) 

That is to say, the median lies one-third of the distance mean to mode 
from the mean towards the mode. 

The following table gives the line mode and the mode calculated in 
accordance with the above formula for certain skew distributions of th<* 
type of fig. 6.10 : 

Comparison of ihe Approximate and Trut Modes in the Case of Five Ihstribuhom oj the 
Height oj the Barometer for Haiti/ Observations at tin Stations named (Distributions 
given by Karl Pearson and Alice Lee, Phil. Trans., A, vol. 190, 1897, p. Pitt.) 


Station. 

Mean. 

Median. 

Approximate 

Mode. 

Tiue Mode. 

Southampton . 

29 981 

30-000 

30 038 

30 039 

Londonderry . 

29 891 

29 915 

29 963 

29*9o0 

Carmarthen 

29 952 

29 974 

30 018 

30 013 

Glasgow . 
Dundee , 

29 8S6 

29 906 

29 946 

29-967 

29 870 

29*890 

29*930 

29 951 


It will be seen that the true and approximate value's art* extremely 
(dose, except in the case of Dundee* and Glasgow, where* the divergence 
reaches two-hundredths of an inch. 

7.28. Summing up the preceding paragraphs, we may sa\ that the 
mean is the form of average 1o use for all general purposes ; it is simply 
calculated, its value is nearly always determinate, its algebraic treatment is 




1Z0 


THEORY OF STATISTICS* 


particularly easy, and in most eases it is rather less affected than the 
median by errors of sampling. The median is, it is true, somewhat more 
easily calculated from a given frequency-distribution than is the mean ; 
it is sometimes a useful makeshift, and in a certain class of cases it is 
more and not less stable than the mean ; but its use is undesirable in 
cases of discontinuous variation, its value may be indeterminate, and its 
algebraic treatment is difficult and often impossible. The mode, finally, 
is a form of average hardly suitable for elementary use, owing to the 
difficulty of its determination, but at the same time it represents an 
important value of the variable. The arithmetic mean should invariably 
be employed unless there is some very definite reason for the choice of 
another form of average, and the elementary student will do very well 
if he limits himself to its use. Objection is sometimes taken to the use 
of the mean in the ease of asymmetrical frequency-distributions, on the 
ground that the mean is not the mode, and that its value is consequently 
misleading. But no one in the least degree familiar with the manifold 
forms taken by frequency-distributions would regard the two as in general 
identical ; and while the importance of the mode is a good reason for 
stating its value in addition to that of the mean, it cannot replace the 
latter. The objection, it may be noted, would apply with almost equal 
force to the median, for, as we have seen (7.27), the difference between 
mode and median is usually about two-thuds of the difference between 
mode and mean. 

The Geometric Mean. 

7.29. The geometric mean G of a senes of \ allies X v X>, X 3 , . . . A\y 
is defined by the relation 

G^(X 1 X 2 X> . . . X N )v» . . . (7.10) 

The definition may also be expressed in terms of logarithms : 

log U ^S(logA') .... (7.11) 

that is to say, the logarithm of the geometric mean of a series of values 
is the arithmetic mean of their logarithms. 

The geometric mean of a gi\en series of quantities is always less than 
their arithmetic mean ; the student will find a proof in most textbooks 
of algebra, and in ref. (105). The magnitude of the difference depends 
largely on the amount of dispersion of the variable in proportion to the 
magnitude of the mean (cf. Exercise 8.12, p. 153). It is necessarily 
zero, it should be noticed, if even a single value of X is zero, and it may 
become imaginary if negative values occur. 

Calculation of the Geometric Mean. 

7.30. From equation (7.11) it will be evident that the calculation of 
the geometric mean .s exactly the same as that of the arithmetic mean, 
except that instead of adding the \alues of the variable we add the 
logarithms of those values. If there are many values we can draw up 
a frequency table for the logarithms and proceed as in Examples 7\k 
and 7.2. 



averages and other measures of location. 127 

properties of the Geometric Mean. 

7.31. The geometric mean is rigidly defined and takes account of 
all the observations. It is also fairly easily calculated, though not so 
easily as the arithmetic mean. It has, however, no simple and obvious 
properties which render its general nature readily comprehensible. This, 
coupled with its rather abstract mathematical character, has prevented 
it from coming into general use as a representative average. 

7.32. At the same time, as the following examples show, the geo* 
metric mean possesses some important properties, and is readily treated 
algebraically in certain eases. 

(a) If the series of observations X consist of r component series, there 
being JVj observations in the first, iV 2 m the second, and so on, the geo¬ 
metric mean G of the whole series can be readily expressed in terms of 
the geometric means G u (V 2 , etc., of the component series. For evidently 
wc have at once (as in 7.16 (b)): 

JViog G log <7, + AUog G 2 + . . . + W,log G r . (7.12) 

(b) The geometric* mean of the ratios of corresponding observations 
in two series is equal to the ratio of their geometric means. For if 

A-AVX, 

log A = log X } ~ log A r o 
then summing for all pairs of and AVs : 

G - GJG 2 .(7.13) 


(c) Similarly, if a variable X is given as the product of any number of 
others, i.e . if 


X - AjA 2 A n 


A r 


X v X' 2 , . . . X r denoting corresponding observations in r different series, 
the geometric mean G of A" is expressed in terms of the geometric means 
G i, G 2 , . . . G r of Aj, A 2 , . . . Xjy by the relation 

G - G X G 2 G 3 . . . Gy . . . (7.14) 


That is to say, the geometric mean of the product is the product of the 
geometric means. 

7.33. The geometric mean finds applications in several eases where 
we have to deal with a quantity whose changes tend to be directly pro¬ 
portional to the quantity itself, e.g. populations ; or where we are dealing 
with an average of ratios, as in index-numbers of prices. Suppose, 
for instance, we wish to estimate the numbers of a population midway 
between two epochs (say two census years) at which the population is 
known. If nothing is known concerning the increase of the population 
save that the numbers recorded at the first census were P {) and at the 
second census n years later P n% the most reasonable assumption to make 
is that the percentage increase in each year has been the same, so that 
the populations in successive years form a geometric series, P 0 r being 
the population a year after the first census, P 0 r 2 two years after the first 
census, and so on, so that 


P n =P 0 r» . 


. (7.15) 



128 THEOitY OF STATISTICS. 

The population midway between the two censuses is therefore 

P nlt -P^-.{P a P n )i . . • (7.10) 

i.e. the geometric mean of the numbers given by the two censuses. This 
result must, however, be used with discretion. The rate of increase of 
population is not necessarily, or even usually, constant over any con¬ 
siderable period of time : if it were so, a curve representing the growth of 
population as in tig. 7.1 would be everywhere convex to the base, whether 

1801 11 21 SI 41 Si 61 71 SI 91 1901 



Cens tt & v ’ear 

Fie. 7.1*. Showing the Populations of << (tain Hmal Conn ties of England 
for Each Census ^ ear lrom 1801 to 1901. 


the population were increasing or decreasing. In the diagram it will bo 
seen that the curves arc frequently concave towards the base, and similar 
results will often be found for districts in winch the population is not 
increasing very rapidly, and from which there is much emigration. 
Further, the assumption is not self-consistent in any case in which the 
rate of increase is not uniform over the entire area—and almost any area 
can be analysed into parts which are not similar in this respect. For if 
in one part of the aiea considered the initial population is l\ and the 
common ratio It, and m the remainder of the area the initial population 
is p 0 and the common ratio r, the population in year n is given by 

P n \pn-l\R* fp 0 r” 

This does not represent a constant rate of increase* unless R^r. If then, 




AVERAGES AND OTHER MEASURES OF LOCATION. 120 

for example, a constant percentage rate of increase be assumed for England 
and Wales as a whole, it cannot be assumed for the Counties: if it be 
assumed for the Counties, it cannot be assumed for the country as a whole. 
The student is referred to refs. (116) and (117) for a discussion of methods 
that may be used for the consistent estimation of populations in such 
circumstances. 


Use of the Geometric Mean in Index-numbers. 

7.34. The property of the geometric mean illustrated by equation 
(7.13) renders it, in some respects, a peculiarly convenient form of average 
in dealing with ratios, i.e. “ index-numbers,’' as they are termed, of prices. 1 
Let 


Y ' 
o * 

AY> 

v 

0 > • 

• • ay 


-V, 

Y 

A i * • 

. . AY 

AY, 


AY", . 

. . AY 


denote the prices of N commodities in the years 0 , 7, 2 . . . . Further, 
let Y l q-Xj/A'o, and so on, so ihat 

i7.., n, r,;,\. .. Fit, 

fy n o, f;;;, . .. fj. 

represent the ratios of the prices of the several commodities in years 7, 2 , 
. . . to their prices m year 0. These ratios, in practice multiplied by 100, 
are termed index-numbers of the prices of the several commodities, on the 
year 0 as base. Evidently souk 1 form of a\erage of the F’s for any given 
year will afford an indication of the general lc\il of prices for that year, 
provided the commodities chosen are sullieiently numerous and repre¬ 
sentative. The question is, what form of average to choose. If the 
geometric mean be chosen, and G U) , (* 20 denote the geometric means of the 
Fs for the years / and 2 respectively, we have : 


^ r 20 

Gio 


/ 


v f " 

X 20 

I 20' 

l/N 

1 

Y'» 

' Yu, 

V'" 

1 10 

* ’ V / 
1 10 


/AY 

AY 

-V/" 

AY' N 

J/AT 

) 

ay ' 

• AY ■ 

AY. 

• • Ay, 


(lY'i . 

. Y'lt ■ 

■xrin 

>21 . . 

. . lYi) 

i/v. 


(7.17) 


From the first form of this equation we see that the ratio of the geometric 
mean index-number in year 2 to that in year 7 is identical with the geo¬ 
metric mean of the ratios for the index-numbers of the several commodities. 
A similar property does not hold for any other form of average : the ratio 
of the arithmetic mean index-numbers is not the same as the arithmetic 
mean of the ratios, nor is the ratio of the medians the median of the ratios. 
From the second and third forms of the equation it appears further that the 
ratio of the geometric mean index-number in year 2 to that in year 7 is 
independent of the prices in the year first chosen as base (i.e, year 0 ), and 


J The literature of index-numbers is extensive and it is impossible to discuss them 
in the limits of this book. There is still difference of opinion as to the most suitable 
form of an index-number, and we do not mean to prejudge this question in the above 

section. 


n 





THEORY OF STATISTICS. 


180 

is identical with the geometric mean of the index-numbers for year 2 f on 
year 1 as base. Again, a similar property does not hold for any other form 
of average. If arithmetic means of the index-numbers be taken, for 
example, the ratio of the mean in year 2 to the mean in year 1 will vary 
with the year taken as base, and will differ more or less from the arithmetic 
mean ratio of the prices in year 2 to the prices of the same commodities in 
year 1 ; the same statement is true if medians be used. The results given 
by the use of the geometric mean possess, therefore, a certain consistency 
that is not exhibited if othe r forms of average are employed. It was used 
in a classical paper by Jevons (ref. (108)), though not on quite the same 
grounds, but has never been at all generally employed, although it is now 
in use for the index of wholesale prices compiled by the British Board of 
Trade. 


The Harmonic Mean. 

7.35. The harmonic mean of a series of quantities is the reciprocal of 
the arithmetic mean of their reciprocals; that is, if // lx* the harmonic mean, 


1 

// 


J s{ M 

N \XJ 


• (7.18) 


The following illustration will serve to show the method of calculation: - 

v/j Example 7.5 .- The table gives the number of litters of mice, in certain 
breeding experiments, with given numbers (A) in the litter. (Data from 
A. D. Darbishirc, Biometrika , vol. 8, pp. 80, 81.) 


Number in 
Litter. 

X. 

Number of 
Litters. 

/• 


1 

7 

7 000 

2 

11 

5'500 

3 

16 

5 333 

4 

17 

4'250 

5 

26 

5 200 

6 

31 

5-167 

7 : 

11 

1-571 

8 

1 

0125 

9 

1 

0111 

— 

121 

34*257 


Whence ^ ’^j 57 - 0-2831 

7 /- 3*532 


The arithmetic mean is 4*587, more than a unit greater. 


Reciprocal Character of Arithmetic and Harmonic Means. 

7.36. Prices may be stated in two different ways which are reciprocally 
related, the resulting arithmetic mean of the one being the harmonic 
mean of the other. Supposing we had 100 returns of retail prices of eg^s, 
50 returns showing twelve cggsJ to the shilling, 80 fourteen to the shilling* 
and 20 ten to the shilling ; then the mean number per shilling would be 



A V EE AGES AND OTHER MEASURES OR LOCATION. 131 

12*2, equivalent to a price of 0-98Id. per egg. But if the prices had been 
quoted in the form usual for other commodities, we should have had 50 
returns showing a price of Id. per egg, 30 showing a price of 0*857d. and 
20 a price of l*2d. : arithmetic mean 0*997d., a slightly greater value 
than 1 lie harmonic mean of 0*984d. 

The harmonic mean of a series of quantities is always lower than the 
geometric mean of the same quantities, and a fortiori , lower than the 
arithmetic mean, the amount of difference depending largely on the 
magnitude of the dispersion relatively to the magnitude of the mean (cf. 
Exercise 8.13, p. 153). 


SUMMARY. 

1. Measures of the location or position of a frequency-distribution arc 
called averages. ' J 

2 . There are three types of average m general use, the mean (arithmetic, 
geometric and harmonic), the median and the mode. 

3. The arithmetic mean of N values A 7 j9 A r a , . . . \\ is given by 

The geometric mean is given by 

G- (A'j . . . A :„)'!* 

or 1°« k'=^S(Iog X) 

The harmonic mean is given by 

I V 1 

II N [ X 

I. The median is the central value 1 ot the variable when the values are 
ranged m order of magnitude ; if the number of values is even, the* median 
is conventionally taken to be the arithmetic mean of the two central values. 

| 5. The mode is the value ot 1 he variate corresponding to the maximum of 

the ideal curve which gives the closest possible lit to the actual distribution. 

(). For distributions of moderate skewness there is an empirical relation¬ 
ship between the mean, the median and the mode expressed by the equation 

Mode — Mean - 3(Mean - Median) 


EXERCISES. 


7.1. Verify the following means and medians from the data of Table 6.7, 
page' 91; — 


Mean 
Median . 


y Stature in Inches for Adult Males in 
fc/Eii£land. Scotland. Wales. 
.07*31 68 55 66*62 

. 67*35 68-48 66-56 ✓ 


Ireland. 

67-78 \S 
67 69 


In the calculation of the means use the same arbitrary origin as in Example 7.1 
and check your work by the method of 7.16 (b). 



182 THEORY OF STATISTICS. 

7.2. The mean of 13 numbers is 10, and the mean of 42 other numbers is 16. 
Find the mean of the 55 numbers taken together. 

7.8. Find the mean weight of adult males in the United Kingdom from the 
data in the last column of Exercise 6.0, page 111. Find the median weight, and 
hence find the approximate mode by the relation of 7.27. 

7.4. Similarly, find the mean, median and approximate value of the mode 
for the distribution of fecundity in race-horses. Table 6.1>, page 98. 

7.5. Using a graphical method, find the median income subject to sur- or 
super-tax in the financial year 1981 from the data of Table 6.5, page 89. 

7.6. Find the arithmetic mean of the first n natural numbers and show that it 
coincides with the median. 

7.7. (Data from Agricultural Statistics , England and Wales, Part 2, 1982.) 
The figures in columns 1 and 2 of the small table below show the index-numbers 
of prices of certain commodities in the harvest years 1926 and 1931, the years 
1911-13 being taken as 100. In column 3 have been added the ratios of the 
index-numbers in 1931 to those in 1926, the latter being taken as 100. 

Find the average ratio of prices in 1931 to those in 1926 — 

(1) From the arithmetic mean of the ratios in column 3. 

(2) From the ratio of the arithmetic means of columns 1 and 2. 

(3) From the ratio of the geometric means of columns 1 and 2. 

(4) From the geometric mean of the ratios of column 3. 

Note that, by 7.32, the last two methods must gi\e the same result. 


1 . 

2 , 

a! 

4. 

5. 

6 . 



Index-uiimboi of Price in 

Ratios. 

Commodity. 

1926. 

1931. 

31/26. 


1. 

2. 

3. 

Wheat. . . . ! 

157 

79 

j 50 3 

Fat Cattle . 

131 

1 IS 

90 1 

Milk 

1 163 

J 39 

i 85-3 

Kgg» .... 

J 1(1 

110 

73 8 

Fi mt .... 

165 

132 

80 0 

Vegetables , 

135 

158 

j 117 0 


7.8. Find the arithmetic and geometric means of the series 1,2, 1, 8, 16 
. . . 2 n . Find also the harmonic mean. 

7.9. Supposing the frequencies of values 0, 1,2,. . . of a variable to be given 
by the terms of the binomial series 


. n(n ~ 1) . 

q\ nq'-'p, i q P y 

when' p f q - 1, hnd the mean. 

7.10. Show that, in finding the arithmetic mean of a set of readings on a 
thermometer, it does not matter whether we measure temperature in C entigrade 
or Fahrenheit degrees, but that in finding the geometric mean it does matter. 

7.11. (Data from Census of 1901.) The table below shows the population 
of the rural sanitary districts of Essex, the urban sanitary districts (other than 
the borough of West Ilam), and the borough of West Ham. at the censuses 
of 1891 and 1901. Estimate the total population of the county at a date midway 
between the two censuses, (1) on the assumption that the percentage rate of 









AVERAGES AND OTHER MEASURES OF LOCATION. 138 

increase is constant for the county as a whole; (2) on the assumption that the 
percentage rate of increase is constant in each group of districts and the borough 
of West Ham, 


Essex, 

Population. 

1891. 

1801. 

Rural districts . . , 

232,867 

240,776 

West Ham .... 

204,903 

267,358 

Other urban districts . . ! 

345,604 

576,864 

Total 

783,874 

1,083,998 


7.12. (Data from Agricultural Statistics , Part 2, 1932.) The following 
statement shows the monthly average prices of eggs in England and Wales in 
1932, as compiled from returns from certain markets for National Mark Specials 
and English Ordinaries, First Quality, per 120: - 


_ — - 

— 

— _ 

-- 

— ~ 

Month. 

N.M. Spec laK 

English Ordinalies, 
First Quality 


s. 

cl. 

s. 

cl. 

January .... 

J 8 

11 

15 

2 

February .... 

15 

0 

12 

it 

Match .... 

11 

11 

10 

0 

April 

10 

10 

9 

2 

May 

30 

9 

8 

9 

June .... 

12 

0 

10 

0 

duly .... 

14 

2 

12 

ft 

August .... 

15 

ft 

33 

9 

September 

18 

10 

16 

3 

October . . . . i 

1 20 

9 

18 

9 

November. 

24 

1 

21 

8 

December .... 

21 

V» 

! Hi 

10 

Mean for year . 

I hi 


33 

10 

. _ _ __ _ _ 



___ 

_ _ _ 


What would have been the mean price for the year in each case if the wholesale 
prices had been recorded as retail prices sometimes arc, i.e. at so many eggs per 
shilling? State your answer in the form of the equivalent price per 120, and 
obtain it in the shortest way by taking the harmonic mean of the above prices. 








CHAPTER 8. 


MEASURES OF DISPERSION. 

Range. 

8.1 . Wo can now turn to a consideration of measures of the dispersion 
of variate values about the contra! values we have discussed in the last 
chapter. 

The simplest possible measure of dispersion is the range, i.e. the 
difference between the greatest and least values observed. The extreme 
ease with which this measure may be calculated and its very obvious in¬ 
terpretation have led to its use in many industrial problems. There are, 
however, serious objections to the use of tlie range which usually more 
than offset these advantages. 

In the iirst place, the range is subject to fluctuations of considerable 
magnitude from sample to sample. There are seldom real upper or lower 
limits to the values which a \ unable can take, large or small values 
being only more or less infrequent. The occurrence of one of these in¬ 
frequent values may have quite a disproportionate effect on the range. 
Suppose, for example, we consider the data of Exercise 6.6, page 111, 
showing the frequency-distributions of wcighls of adult males in several 
parts of the United Kingdom. In Wales one individual was observed with 
a weight of over 280 lb., the next heaviest being under 260 lb. The addition 
of this one exceptional man to 737 others has increased the range by some 
30 lb., or about 20 per cent. 

Moreover, the range takes no account of the form of the distribution 
within the range. We might get the same value for the range from a 
symmetrical and a J-shaped frequency-curve. Cleaily we could not regard 
two such distributions as exhibiting the same dispersion. 

8.2. A measure of dispersion, in fact, should obey conditions similar 
to those we laid down for measures of location in the last chapter (7.5). 

f That is to say, it should be based on all the observations, should he readily 
t comprehensible, fairly easily calculated, affected as little as possible by 
; fluctuations of sampling, and amenable to algebraical treatment. 

There are three measures of dispersion in general use, the standard 
deviation, the mean deviation and the quartile deviation or semi- 
/interquartile range. We will consider them in that order. 

The Standard Deviation. 

8.3. The standard deviation is the square root of the arithmetic mean 
of the squares of all deviations, deviations being measured from the arith¬ 
metic mean of the observations. If the standard deviation be denoted by 
a, and a deviation frorw the arithmetic mean by then the standard 
deviation is given by the equation 

134 


• ( 8 - 1 ) 



* MEASURES OF DISPERSION 


135 


To square all the deviations may seem at first sight an artificial procedure, 
but it must be remembered that it would be useless to take the mere sum 
of the deviations, in order to obtain a measure of dispersion, since this sum 
is necessarily zero if deviations be taken from the mean. In order to 
obtain some quantity that shall vary with the dispersion, it is necessary to 
average the deviations by a process that treats them as if they were all of 
the same sign, and squaring is the simplest process for eliminating signs 
which leads to results of algebraical convenience. 

Root-mean-square Deviation. 

8.4. The standard deviation is a particular ease of a more general 
quantity, known as the root-mean-square deviation, which has theoretical 
importance. 

Let A be any arbitrary value of X, and let f (as in 7.11) denote the 
deviation of X from A ; i.e . let 

£~X-A 

Then we may define the root-mean-square deviation s from the origin A 
by the equation 

* 2 =^S(C).(8.2) 


The standard deviation is the value of the root-mean-squarc deviation 
taken from the mean. 

8.5. The quantities a 2 and s 2 , i.e. the squares of the standard and 
root-mean-square deviations, are sufficiently important in much theoretical 
work to have special names. 

The square of the standard deviation, <r 2 , is called the variance. 

The quantity ^S(£ 2 ), i.e . ,9 2 , is called the second moment about the 

value A. We have already seen (7.11) that the quantity ^S(£) is called 

the first moment about A , and in the next chapter we shall consider 
moments of higher orders. 

Thus, the variance is the second moment about the mean. 


Relation between Standard and Root-mean-square Deviations. 

8 .6. There is a very simple relation between the standard deviation 
and the root-mean-square deviation from any other origin. Let 


so that 
Then 


M~ A (1 . 
£—x+d 

g 2 ~x 2 +2 xd 4 d 2 

S(£ 2 ) ~S(# 2 ) +2dS(a?) +X(^ 


iS.3) 


But the sum of the deviations from the mean is zero, therefore the second 
term vanishes, and accordingly 


s 2 ~v 2 +d 2 . 


(8.4) 



THEORY OF STATISTICS. 


136 

Hence the root-mean-squarc deviation is least when deviations are 
measured from the mean, i.c . the standard deviation is the least possible 
root-mean-square deviation. 

8.7. If a and cl are the two sides of a right-angled triangle, s is the 
hypotenuse. If, then, Mil be the vertical through the mean of a frequency- 

distribution (tig. 8.1), and MS be set off 
equal to the standard deviation (on the 
same scale by which the variable X is 
plotted along the base), SA will be the 
root-mean-square deviation from the point 
A. This construction gives a concrete idea 
of the way in which the root-mean-square 
deviation depends on the origin from which 
deviations are measured. It will be seen 
that for small values of d the difference of 
x s and a will be very minute, since A will 
he very nearly on the circle drawn through 
Fig. 8.1. ill with centre S and radius SM : slight 

errors in the mean due to approxima¬ 
tions in calculation will not, therefore, appreciably affect the \alue of the 
standard deviation. 

Calculation of the Standard Deviation. 

8.8. If we have to deal with relatively few, say thirty or forty, 
ungrouped observations, the method of calculating the standard deviation 
is perfectly straightforward. It is illustrated by the figures below giving 
the minimum wage-rates for agricultural labourers in England and Wales 
at the beginning of 19,3(5. 

First of all the mean is ascertained. Then we find tlie values of a 1 by 
subtracting the mean from all values of the variable. Each difference is 
squared and the total, S(tT“), obtained. This total divided by the total 
frequency is the square of the standard deviation. 

In practice, we can simplify the arithmct le by working from an arbitrary 
value A instead of from the mean. Such a value is usually known as the 
“ working mean.” When we have found the mean-square deviation s 2 
about A we can easily find the value of a 2 from equation (8.4). 

Example SJ. Calculation of Standard Deviation for a short series of 
observations (49) ungrouped. Minimum weekly rates of wages for 
ordinary adult male agricultural workers in England and Wales as at 
1st January 1980. 

By inspection of the table opposite we see that the mean is in the neigh¬ 
bourhood of ,32 shillings. We therefore take this as the working mean A . 
The column headed 14 Difference ” is the excess of the value of the variable 
over this value. The column headed “ (Difference) 2 ” is the square of 
the excess. We find 

1 # - 79 

19 -' -1*612 pence 

Hence the mean =32 shillings - 1*012 pence 

= 81 shillings 10* f pence approximately. 




MEASURES OF DISPERSION 


187 


Area. 

Wage Rates. 

Difference, 

£ (pence). 

Bedford and Huntingdon shires 

s. d. 

31 6 

- 6 

Berkshire . , . 

31 0 

-12 

Bucks ....... 

32 0 - 

_ 

Cambridgeshire ..... 

31 6- 

- 6 

Cheshire ...... 

32 6. 

6 

Cornwall ...... 

32 0 - 

— 

Cumberland ...... 

32 6- 

6 

Derbyshire ...... 

36 0 

48 

Dorset ....... 

31 6 

- 6 

Durham ...... 

- 21* 0 

-36 

Essex ....... 

31* 0 

-12 

Gloucester 

31 0 

12 

Hampshire . . . . . . i 

31 0 

- 12 

Herefoid . . . . . . 

31 0 

12 

Hertford ...... 

32 0 

— 

Kent ....... 

33 0- 

12 

Laneashne (South) ..... 

32 9* 

9 

„ (Rost) ..... 

36 6 

54 

Leicester ...... 

33 4 0 

12 

Lines (Holland) . . 

34. 0 

24 

,, (Renteveil and Lindsey) . 

31- 0 

-12 

Middlesex ...... 

33 - 8 

20 

Monmouth ...... 

32 (V - 

— 

Norfolk ...... 

31" 6 r 

6 

Nort bants ...... 

31 6 

- 6 

Northumberland ..... 

3J 6 

- 6 

Notts ....... 

32 0 

— 

Oxfordshire ...... 

31 * 6- 

- 6 ' 

Rutland ...... 

31. 6 

- 6 

Shropshire ...... 

32. 0 

— 

Somerset 

32 6 

6 

Si alls ....... 

31 6 

- 6 

Suffolk ....... 

31 0 

-12 

Surrey ...... 

32* 3 

3 

Sussex ...... 

32 0 “ 

— 

Warwickshire ..... 

3 (f 0 

24 

Westmorland ...... 

3 1* 0 

- 12 

Wiltshire ...... 

31 0 

12 

Worcester ...... 

31 0 

-12 

Yorks, E. Riding ..... 

33 - <> 

18 

,, N. Riding ..... 

33 0 

12 

„ W. Riding ..... 
Anglesey and Caernarvon 

33- 9 

21 

3h a 

-12 

Carmarthen. 

31* <S 

- 6 

Denbigh and Flint ..... 

30* 6 

-18 

Glamorgan ...... 

33* 6 

18 

Merioneth and Montgomery 

-28 6 1 

-42 

Pembroke and Cardigan .... 

31' 0 

-12 

Radnor and Brecon .... 

30 0 

-24 

Totals 

— * 

-79 


(Difference) 2 , 
£ 2 . 


36 

144 

36 

36 

36 

2304 

36 

1296 

144 

144 

144 

144 

144 

81 

2916 

144 

576 

144 

400 

36 

36 

36 

86 

36 

36 

36 

144 

9 

576 

144 

144 

144 

324 

144 

441 

144 

36 

324 

324 

17 & 

576 


14,639 










188 

Also 


i THEORY OF STATISTICS. 


N 


S(f*)- 


14,589 

49 

s 2 ~d 2 


= 296*714 ™ s 2 


296*714 - (T612) 2 
— 294*112 

a = 17*15 pence approximately. 


We would direct the student’s attention to the necessity for checking 
his work at each stage before proceeding to the next. If he neglects tins 
warning he is likely to learn by bitter experience how essential it was. 
For instance, in the above work it would be well to cheek the value of 
the mean by summing the wage* rates and dividing by 49. We get in 
this way: 


Mean =- 


1561s. 5d. 
49 


— 31s. 10*4d. 


which checks with the mean found from the working mean. Secondly, 
the squares of differences should be checked before they are added, and 
if the addition is made without a machine, a check should be earned out 
by summing first from bottom to top and then trom top to bottom, 
to avoid repeating errors. A further systematic check is given m 8.10 
below. 

8.9. If we have to deal with a grouped frequency-distribution the 
same artifices and approximations arc used as in the calculation of the 
mean (7.10 and 7.11). The mid-value of one of the class-intervals is 
chosen as the arbitrary origin A from winch to measure the deviations 
the class-interval is treated as a unit throughout the arithmetic, and all 
the observations within any one class-interval are treated as if they were 
identical with the mid-value of the interval. If, as before, we denote the 
frequency in any one interval by/, these/observations contribute 2 to 
the sum of the squares of deviations, and we have : 




The standard deviation is then calculated from equation (8.4). 

8 . 10 . As the arithmetic m calculating the standaid deviation is often 
extensive, it is as well to use some cheek similai to that of 7.12. In 
this case we have: 

(£ + l) 3 = £ 2 +2f f 1 

/(f+i Y-ip+m+f 

S{/(£ S(/£ 2 )+2S(/£)+JV 

Hence, if we calculate S{/(£ + l) 2 j as well as S(/f 2 ), the above equation 
gives us a simple check on the accuracy of our work. The following 
examples illustrate the method: - n 

Ewample 8.2.—Calculation of the Standard Deviation of stature of 
male adults in the British Isles from the figures of Table 6.7, page 94. 



MEASURES OF DISPERSION, 


139 


(1) 

Height, 

Inches. 

(2) 

Frequency. 

/• 

,3, 

Deviation 
from 
Value A . 

f. 

(4) 

Product. 

/f. 

(5) 

M * !)• 

(«) 

Product. 

!?■ 

(7) 

/<4 H>“. 

57- 

2 

-10 

- 20 

- 18 

200 

162 

58- 

4 

9 

- 30 

- 32 

324 

256 

59- 

14 

- 8 

- 112 

- 98 

890 

686 

50- 

41 

- 7 

- 287 

- 240 

2,009 

1,476 

01- 

83 

- 0 

- 498 

- 415 

2,988 

2,075 

02- 

109 

- 5 

- 845 

- 070 

4,225 

2,704 

03- 

394 

4 

1,570 

- 1,182 

0,304 

3,546 

04- 

069 

- 3 

2,007 

1,338 

0,021 

2,676 

65- 

990 

2 

-1,980 

- 990 

3,900 

990 

00- 

1,223 

- 1 

-1,223 

- 4,995 

1,223 

— 

07- 

1,329 

0 

- 8,584 

1,329 

- 

1,329 

68- 

1,230 

f 1 

1,230 

2,460 

1,230 

4,920 

09- 

1,003 

1 2 

2,120 

3,189 

4,252 

9,567 

, 70- 

040 ! 

+ 3 

1,938 

2,581 

5,814 

10,336 

! 71- 

392 

1- 4 

1,568 

1,960 

0,272 

9,800 

| 72- 

202 

4 5 

1,010 

1,212 

5,050 

7,272 

73- 

79 

f 0 

474 

553 

2,841 

3,871 

74- 

32 

4 7 

221 

250 

1,508 

2,048 

75- ! 

10 

+ 8 

128 

144 

1,024 

1,296 

70- 

5 

1 9 

45 

50 

40 5 

500 

77- 

2 i 

\ 10 

20 

22 

200 

212 

Total 

8,585 

— 

8,703 

13,759 

50,809 

65,752 

„ _ 

' _ 

. 

„ . _ 

_ J 

. 

_ 


S (/£) = 8,703-8,584 = 179 
S {/($ H)} =13,759 - 4,995 = 8,704 


This is an example we have already considered when calculating 
the mean, and the work of the first four columns is the same as that of 
Example 7.1, page 110. 

As a cheek on S(/£) we have : 

*>{/*(£ 1 T))-S(/^)-87(H-179 
- 8585 

-AT 

As a cheek on S(/<f 2 ) we have : 

S[/(g +1 ) 2 } - S( /| 2 ) - 2S( /|) - 05,752 - 56,809 - 358 

= 8585 

=-JV 


0*0172 


From previous work, M - A — 4-0*0209 class-intervals or inches. 
S(fg 2 ) ~ 50,809 
JV ~ 8585 

a 2 = 6*6172 -(0-0209)* 

= 6*6108 

•\ a = 2*57 class-intervals or inches. 






140 


THEORY OF STATISTICS, 


Emmpk 8.3 .—Let us find the mean and standard deviation of the 
distribution of Australian marriages given m Table 6.8, page 96. 

Calculation of Standard Deviation of age of bridegroom m a distribution 
of Australian marriages. 


Age of 







Bridegroom. 
(Central Value, 

Frequency. 

/• 

f. 

fi 

/(£ + !). 

/I*. 

/(£+!)“. 

Years ) 






16 5 

294 

~4 

- 1,176 

- 882 

4,704 

2,646 

19 5 

10,995 

-3 

32,985 

21,990 

98,955 

43,980 

22 5 

61,001 

- 2 

-122,002 

-61,001 

244,004 

61,001 

25 5 

73,054 

-3 

- 73,054 

— 

73,054 

_ 

28 5 

56,501 

0 

56,501 

.56,501 

31 5 

33,478 

1 

33,478 

66,956 

33,478 

133,912 

34 5 

20,509 

2 

41,138 

61,707 

82,276 

185,121 

37 5 

14,281 

3 

42,843 

57,124 

128,529 

228,496 

40 5 

9,320 

4 

37,280 

46,600 

149,120 

233,000 

43 5 

6,236 

5 

31,180 

37,416 

155,900 

224,496 

46 5 

4,770 

6 

28,620 

33,390 

171,720 

233,730 

49 5 

3,620 

7 

25,340 

28,960 

177,380 

231,680 

52 5 

2,190 ! 

8 

17,520 ! 

19,710 

140,160 

177,390 

55 5 

1,655 

9 

14,895 

16,550 

134,055 

165,500 

58-5 

1,100 

10 

11,000 

12,100 

110,000 

133, KM) 

615 

810 ! 

11 

8,910 

9,720 

98,010 

116,640 

64 5 

649 

12 

7,788 

8,437 

93,456 

109,681 

67 5 

487 

| 13 

6,331 

6,818 

82,303 

95,452 

70 5 

326 

14 

4,564 

4,890 

63,896 

73,350 

73 5 

211 

15 

3,165 

3,376 

47,475 

54,016 

76 5 

119 

16 

1,904 

2,023 

30,464 

34,391 

79 5 

73 

17 

1,241 

1,314 

21,097 

23,652 

82 5 

27 

18 

486 

513 

8,748 

9,747 

85 5 

14 

19 

266 

280 

5,054 

5,600 

88 5 

5 

20 

100 

105 

2,000 

2,205 

Total 

301,785 

— 

88,832 

390,617 

2,155,838 

2,635,287 

- __ 

— 

__ 1 

_ 





We take a working mean A =28*5. 

As a cheek on S(/£) we have: 

S{ /(£ + ])}- S( /£) - 890,G17 - 88,882 
- 801,785 
~N 

As a check on S( /£ a ) we have : 

S (/(f + 1 ) 2 ) “ s (/£ 2 ) ab( /|) = 2,685,287 -2,155,888 - 177,664 

801,785 

Then 

M-A-A- •= 0 29486 interval 

= 0*88808 year 

A/ = 29*888 years 


Hence, 




MEASURES OF DISPERSION. 


141 


We have: 

.v 3 - —7*113622 intervals 8 

o() 1,7 oo 

a 3 -,v 2 - d 2 intervals 2 
-7*056974 intenals 2 

a =*- 2*6565 intervals 
- 7*969, or 8 years approximately. 

Sheppard’s Correction for Grouping. 

8.11. The student must remember that the treatment of all the 
\alues of a \unable in a elass-intenal as jf they were concentrated at 
the ce ntre of tha t interval is an approximation, although, for distributions 
of symmetrical or moderately skew typo and class-intervals not greater 
than about one-twentieth of the range, the approximation may be a 
very elose one. 

It has been shown that if 

(a) the distribution of frequency is continuous, and 

(b) the frequency tapers off to zero in both dnections, 

the variance obtained fiom grouped data may with advantage be corrected 
for the grouping effect bv subtracting from it onc-twclftli of the square 
of the class-interval; i.e. if the class interval be h units in width, a 2 the 
corrected value of the variance and a x 2 the value obtained from the 
grouped data: 

* 2 - .( 8 - 5 ) 

The proof of this formula lies outside the scope of this book. YYV may 
emphasise condition (b), The Sheppard eoriection is not applicable to 
J- or U-shaped distributions, or even to the skew form of fig. 6.7 (/>), 
page 95. 

Furthermore, unless the total frequency is fairly large, the Sheppard 
correction is likely to be of secondary importance compared with fluctua¬ 
tions of sampling (see 21.13). We suggest that, as a general rule, 
the correction should not be made unless the frequency is at least 
1000, or the grouping coarser than that given by intervals of about one- 
twentieth of the range. We give m Exercise 8.15 a result which will 
convey the general magnitude of the correction for the liner grouping. 

Example 8 A .—In Example 8.2 we have : 

V- 6*6168 
It 2 

— 0*0833 

Corrected value a 2 =6*5835 

and a corrected — 2*56, differing from the uneorreeted value by 0*01. 

Example 8.5. —In Example 8.6 we have : 

a 2 (uncorrected)—7*056971 intervals 2 



142 


THEORY OF STATISTICS. 


Here o'- is expressed in terms of h 3 , and lienee to correct it we subtract 
i\r. giving 

a 2 (corrected) - 0-973611 

a -2-6408 intervals 
— 7-922 years 

as against an uneorrectcd value of 7-969 years. 

Spread of Observations and Standard Deviation. 

8.12. It is a useful empirical rule to remember that a ran ge of si x 
times the standard deviation usually includes 99 per cent, or’hnore of all 
the observations in the ease of distributions of the symmetrical or moder¬ 
ately asymmetrical type. Thus in Example 8.2 the standard douation 
is 2*57 in., six times this is 15-42 in,, and a range from, say, 00 in. to 
75*4 in. includes all hut some 30 out of 8585 individuals, i.c. about 
99*0 per cent. This rough rule serves to give a more definite and concrete 
meaning to the standard delation, and also to cheek arithmetical work 
to some extent—sufficiently, that is to say, to guard against very gross 
blunders. It must not be expected to hold for short scries of observations : 
in Example 8.1, for instance, the actual range is a good deal less than 
six times the standard deviation. 

Properties of the Standard Deviation. 

8.13. The standard deviation is Ihe measure of dispersion which it 
is most easy to treat by algebraical methods, resembling in this respect 
the arithmetic mean amongst measures of position. The majority of 
illustrations of its treatment must he postponed to a later stage 
(Chap. 16), but the work of 8.6 has already served as one example. We 
showed in 7.16 that if a series of observations of which the mean is JSI 
consists of two component series, of which the means are M x and M z 
respectively, 

NM~ 

N 1 and N 2 being the numbers of observations in the two component 
series, and N- A\ \ N. z the number in the entire series. Similarly, the 
standard deviation a of the whole series may he expressed in terms of 
the standard deviations and cr 2 oi the components and their respective 
means. Let 

M y M d x 
M % -M-d % 

Then the mean-square deviation^ of the component series about the mean 
M are, by equation (8.t), cr^ + f/, 2 and cr 2 2 + d 2 2 respectively. Therefore, 
for the whole series, 

IVo* — +( 1 } 3 ) + N 2 (a i 3 + (/, 2 ) . . (8.6) 

If the numbers of observations in the component senes be equal and the 
means be coincident, we have as a special ease: 

ct2 --!(° , i 8 4 u 2 2 ) .... (8.7) 

60 that in this case the square of the standard deviation of the whole 



MEASURES OF DISPERSION. 143 

series is the arithmetic mean of the squares of the standard deviations of 
its components. 

It is evident that the form of the relation ( 8 .G) is quite general: if a 
series of observations consists of r component scries with standard devia¬ 
tions a v tjjj, . . . oy, and means diverging from the general mean of 
the whole series by d v d 2 , . . . d ry the standard deviation a of the whole 
series is given (using m to denote any subscript) by the equation 

Aa 2 -S(A w cr m 2 ) + S(N m d m 2 ) . . . ( 8 . 8 ) 

Again, as in 7.16, it is convenient to note, for the cheeking of arithmetic, 
that if the same arbitrary origin be used for the calculation of the standard 
deviations in a number of component distributions, we must have: 

is(/,&*)+ . . . +S(/,f,8) . . (8.9) 

8.14. As another useful illustration, let us find the standard deviation 
of the first N natural numbers. The moan iu this case is evidently 
(A'+l)/2. Further, as is shown in any elementary algebra, the sum of 
the squares of the first N natural numbers is 

iV(A T + 1)(*2JV +1) 

(3 

Applying equation (8,4) we have that the standard deviation cr is given 
by 

a 2 -J(A4l)(2A4l)~l(^tl ) 2 

that is, 

cr 2 — 12 (A 2 - 1 ) .... ( 8 . 10 ) 

This result is of serviee if the relative merit of, or the relative intensity 
of some character in, the different individuals of a series is recorded not 
by means of measurements, e.g. marks awarded on some system of 
examination, but merely by means of the respective positions w^hen 
ranked in order as regards the character, in the same way as boys arc 
numbered in a class. With N individuals there are always N ranks , as 
they are termed, whatever the character, and the standard deviation is 
therefore always that given by equation ( 8 . 10 ). 

Another useful result follows at once from equation ( 8 . 10 ), namely, the 
standard deviation of a frequency-distribution in which all values of X 
within a range 1 1/2 on either side of the mean are equally frequent, 
values outside these limits not occurring, so that the frequency-distribution 
may be represented by a rectangle. The base / may be supposed divided 
into a very large number N of equal elements, and the standard deviation 
reduces to that of the first N natural numbers w hen N is made indefinitely 
large. The single unit then becomes negligible compared with N, and 
consequently 



8.15. It will be seen from the preceding paragraphs that the standard 
deviation possesses the majority at least of the properties which are 
desirable in a measure of dispersion as in an average (7.5). It is rigidly 
defined ; it is based on all the observations made; it is calculated with 
reasonable ease ; it lends itself readily to algebraical treatment; and we 



144 


THEORY OF STATISTICS. 


may add, though the student will have to take the statement on trust 
for the present, that it is, as a rule, the measure least affected by fluctua¬ 
tions of sampling. On the other hand, it may be said that its general 
nature is not very readily comprehended, and that the process of squaring 
deviations and then taking the square root of the mean seems a little 
involved. The student will, however, soon surmount this feeling after a 
little practice in the calculation and use of the constant, and will realise, 
as he advances further, the advantages that it possesses. Such root- 
mean-squarc quantities, it may be added, frequently occur in other 
branches of science. The standard deviation should always be used as 
the measure of dispersion, unless there is some ver\ definite reason for 
preferring another measure, just as the arithmetic mean should be used 
as the measure ot position. 

Note on Nomenclature. 

8.16. A great deal of confusion lias been mtioduccd into statistical 
literature by the many diffeient expressions winch have been used lor 
the standard deviation and simple derivatives ol it. It used to be almost 
a case of tot homines quot nomina , and as the student may nuct these 
expressions elsewhere, we give a short list of them. The term “ standard 
deviation ” is now almost universally accepted, and m this book wc shall 
use no other. 

“ Mean erroi ” (Gauss), “ mean square euoi ” and “ errot of mean 
square ” (Airy) have all been used to denote the standaid deviation. 

The standard deviation is not to be eon!used with the “standard 
error.” We shall use this term m a special sense, that ol the standard 
deviation of simple sampling [cf. 19.8). 

The standard deviation multiplied by the square root ol 2 is also known 
as “ the modulus.” The student will see the uason for this multiplication 
later. The lcciprocal of the modulus is called the “ pucision.” 

There is also a quantity known as the u probable enor,” which is 
defined as being 0*67419 times the standaid deviation (fj. 19.9). These 
last four quantities arc particularly important m the theory of errors of 
observation and the theory ot sampling 

Finally, we may remark that since we shall use* the expression 
“standard deviation” very frequently, we shall sometimes use the 
abbreviation “ s.d.” or simply the symbol a. 

Mean Deviation. 

8.17. We have already lematked that it would be useless to take the 
sum of deviations from the mean as a rneasme of dispersion because such 
sum is identically zero. W r e therefore removed the signs of the deviations 
by squaring to reach the standard deviation. 

It is also possible to overcome this difficulty by adding the sum 
of deviations taken regardless of sign. The arithmetic mean of these 
“ absolute ” deviations is called the mean deviation. 

If we write | £ | to denote the deviation from an arbitrary value A taken 
as positive whatever its actual sign, the mean deviation is thus defined as 

m.d.=^S(|f |) .... (*.i2) 



MEASURES OP DISPERSION. 


145 


/The expression [ £ | is read “ mod £ ” —an abbreviation for “ the modulus 

off”). 

8.18. Just as the root-mean-square deviation is least when deviations 
are measured from the arithmetic mean, so the mean deviation is least 
when deviations arc measured from the median. For suppose that, for 
some origin exceeded by m values out of N, the mean deviation has a value 
A. Let the origin be displaced by an amount c until it is just exceeded by 
m -1 of the values only, i.e. until it coincides with the rath value from the 
upper end of the series. By this displacement of the origin the sum of 
deviations in excess of the origin is reduced by me, while the sum of 
deviations in defect ol the mean is increased by (N - m)c. The new mean 
deviation is therefore 

(N - m )r - rae 
N 

1 


A i 


= A 4 ^(N - 2m)c 


The new mean deviation is accoidingly less than the old so long as 

m ^ \N 

That is to say, if N be oven, the lwan deviation is constant for all 
origins within the range between the AT/2th and tlic (N/2 I l)tli observa¬ 
tions, and this value' is the least; if N be odd, the mean deviation is lowest 
when the origin coincides with the (A 7 4 l)/2th observation. The mean 
deviation is therefore a minimum when deviations are measured from the 
median or, if the latter be mdeterminate, from an origin within the range 
in which it lies. 


Calculation of the Mean Deviation. 

./8.19. The mean deviation is perhaps most easily calculated about the 
mean, which is always determinate, cxe< pt m the ease ot distributions with 
an indeterminate final class. As, however, jt is a minimum about the 
median, wc sometimes require to know the value about that point. The 
following examples wall make the method of calculation, clear. 

Example 8.6 .—Let us find the mean deviation about the mean and 
about the median m the ungrouped data of Example 8.1. 

The data were arranged in alphabetical order of the county wage areas, 
which makes it a little difficult to ascertain the median by inspection. On 
rearranging in order of magnitude, wc find that the median is the value 
31s. 6d. 

The deviations from the median value are, then, in order of magnitude 

— 36, -30, -18, -18, -12, -6 (12 times), 0 (10 times), 

6 (7 times), 9, 12, 12, 12, 15, 18, 18, 18, 24, 24, 2G, 27, 

30, 54, 60 

The sum of the negative deviations ~ -186 
The sum of the positive deviations = 401 
Hence the sum of absolute deviations = 587 

Hence m.d. = ~ — *= 12 pence approximately. 


10 



146 THEORY OF STATISTICS* 

To find the m.d. about the mean, 81s. 10*4d., we note that the 27 
negative or zero deviations from the median would be increased by 4*4 
pence on transferring to the mean, and the 22 positive deviations decreased 
by 4*4 pence. The net effect on the total absolute deviations is then an 
increase of (27 -22) x4*4 pence ~22 pence. 

Hence the m.d. about the mean is : 

587 22 

49 1 49 
==12*48 pence 

Example 8,7, —Let us find the mean deviation of heights about the 
mean in the data of Example 8.2. 

In the case of a giouped frequency-distribution the sum of deviations 
should first be calculated from the centre of the elass-mteiv al m which the 
mean (or median) lies and then reduced to the mean (or median) as origin. 

In this case the mean lies in the interval G7-. We found when calculat¬ 
ing it that the negative deviations totalled 8581 and the posit i\ e dela¬ 
tions 8768. lienee the sum of absolute deviations from the centre of the 
interval is 17,817—the unit of measurement being the class-interval. 

To reduce to the mean as origin wc note that if the number of observa¬ 
tions below the mean is N t and above the mean N 2 , and M - A =d as 
before, wc have to add N x d to the sum when iound and subtract N 2 d, In 
this ease d 0*02 class-interval, JVj — 4918 and N z 8667. 

Hence, we must add 

(4918 -8667) x0*02 - 4-25 intervals 
i.e. The total of dev lations = 17,872 

and 

17 872 

m.d. = ’ - 2*02 intervals or inches. 

8,585 

The mean deviation from the median should be found in a similar way, 
the calculation being assisted if the class-interval in winch the median lies 
is taken as origin. 

8.20. As in the case of the standard deviation, the above calculations 
assume for certain purposes that all the values of the variable can be 
treated as if they were concentisited at the centres of class-intervals. This 
gives sufficient accuracy for all practical purposes if the class-intervals are 
reasonably narrow. It has not been found possible to give any simple 
correction, such as Sheppard's correction, for errors of grouping in the 
mean deviation, but wc give at the end of this chapter an exercise (8.11) as 
to the correction to be applied if the values m each interval are treated 
as if they were evenly distributed over the interval instead of being 
concentrated at its centre. 

Empirical Relation between Mean and Standard Deviations for 
Symmetrical or Moderately Skew Distributions. 

8.21. It is a useful rule for the student to remember that for sym¬ 
metrical or moderately skew distributions the mean deviation is about 



MEASURES OF DISPERSION. , 147 


four-fifths of the standard deviation. Thus, for the distribution of male 
statures of Examples 8.2 and 8.7, we have: 


m.d. 

s.d. 


202 

2-57 


-0*79 


For the short series of observations of Example 8.1 : 


Quartiles. 


m.d. 

s.d. 


12-48 

17*15 


-0-72 


8.22. A natural extension of the idea of 1 lie median consists in ascer¬ 
taining the variate values Q x and such that one-quarter of the observa¬ 
tions lies below Q t and one-quarter above Q 3 . In this case clearly one- 
quarter lies between Q 1 and Mi, the median, and one-quarter between Mi 
and (. 

is termed the lower quartile and Q 3 the upper quartile. The 
quartiles and the median thus divide the observed values of the variable 
into four classes of equal frequency. 

We saw that if the number of observations was even, there was an 
indeterminacy in the position of the median which required the additional 
com cation that m such eases the median would be taken to be mid-way 
between the two central values. Similar indctcrnunacies may arise in 
fixing the quartiles unless the number of observations is one less than a 
multiple of four. Such eases are treated in an analogous way by supple¬ 
mentary conventions, which will be clear from the following examples. 


Example 8.8 .—To determine the quartiles of the data of Example 8.1. 


Here there are 49 observations, and so the 25th gives the median. 
We regard half tin* 25th observation as falling below the median and half 
abo\e. The lower quartile must divide into two equal parts the 24£ 
observations falling below the median. The observations other than the 
median are: 


28/9, 29/-, 30/-, 30/-, 30/0, 31/- (12 times), 31/0 (7 times). 

The lower quartile must divide the 24i observations into two sets of 
12{. The 12th and the 13th values are both, as it happens, 81/-, and 
being between the two is thus 81/- also. 

The 24 observations between the median and the highest value are: 

31/6 (twice), 32/- (7 times), 32/8, 82/0 (8 times), 82/9, 88/- (3 times), 
33/0, 88/0, 33/8, 33/9, 34/-, 30/-, 39/0. 

The 12th and 13th observations are both 32/6, and lienee this is the 
value of Q 3 . 

If the 12th and 13th observations had been, say, 32/6 and 38/-, we 
might have taken Q 3 to be 32/6 but regarded | of the 12th observation 
as lying above that value. 

Example 8.9. —To determine the quartiles of the distribution of 
Example 8.2. 

Data of this kind are treated by simple arithmetical interpolation or 
graphical interpolation on the lines of 7.20 or 7.21. 



THEORY OF STATISTICS. 


The quartiles are to divide the distribution into four equal parts. We 
have, therefore, 

“““-2140-25 

4 

To the interval 65- are 1876 individuals 

Di fference *= 770• 25 

770*25 

lienee, is inches from the beginning of the interval, vvhieh is 


Qj -65*71 

Similarly, from the interval 70 on wants are 1871 individuals. 

Difference from 2146*25 -772*25. 

lienee, 

n - 691.. 772,25 

1068 

- 60*21 inches 

It is left to the student to cheek the values by graphical interpolation. 

Quartile Deviation. 

8.23. If Mi be the value of the median, in a symmetrical distribution 

Mi (it Mi 

and the difference may be taken as a measure of dispersion. But as no 
distribution is rigidly symmetrical, it is usual to take as the measure 

3 ~ ^1 
* 2 

and Q is termed the quartile deviation, or better, the semi-interquartile 
range—it is not a measure of the deviation from any particular average. 
Thus, from the values calculated in Example 8.8 we have: 

^ 82/6-81 /- 18 d „ 

Q- ' 2 ' 2 -9 pence 

and from Example 8.9 we have: 

69*21 -65*71 

-1*75 inches 
JL 

Empirical Relation between Quartile and Standard Deviations. 

8.24. For symmetrical and moderately skew distributions the semi- 
interquartile range is usually about two-thirds of the standard deviation 

Thus, for the height distribution of Examples 8.2 and 8.9, 

Q 1*75 
- Kty - 0*68 
cr 2*57 

For the wage statistics of Examples 8.1 and 8.8, 

Q 9 

-0*52 

a 1715 



MEASURES OF DISPERSION. 


149 

which is considerably lower. We should, however, hardly have expected 
the comparatively few observations comprised in these data to conform at 
all closely to the empirical relation. 

8.25. It follows from this relation that a range of 0 times the standard 
deviation corresponds to a range of 9 times the scmi-intcrquartile range 
(and 7*5 times the mean deviation). Within these ranges we expect to 
find at least 99 per cent, of the observations in symmetrical or moderately 
skew distributions. 

Comparison of the Three Measures of Dispersion. 

8.26. The semi-interquartile range has two advantages over the 
standard deviation and the mean deviation ; it is calculated with great 
ease, and it has a clear and simple meaning. 

In almost all other respects the advantage lies with the standard 
deviation. The semi-interquartile range has no simple algebraical pro¬ 
perties, and its behaviour under fluctuations of sampling is difficult to 
decide. In all but the most elementary statistical work these are over¬ 
whelming disadvantages, and the list* of the semi-interquartile range is not 
to be recommended unless the calculation of the standard deviation has 
been rendered difficult or impossible, e.g, owing to the employment of 
irregular class-frequencies or of an indefinite terminal class. 

Absolute Measures of Dispersion. 

8.27. The three measures of dispersion we have been discussing have 
all been expressed in terms of the units of the variate; e.g. the standard 
deviation of height-frequencies was found in inches, and the mean deviation 
of wage-frequencies in pence. It is thus impossible to compart* disper¬ 
sions in different universes unless they happen to be measured in the 
same units. 

For this reason some statisticians have recommended the use of 
“ absolute ” measures of dispersion, which shall be pure numbers and 
not expressible in some particular scale of units. Such measures would 
permit of comparison between universes of very different natures. 

It is easy to construct several coefficients of the kind required. The 
standard deviation and the mean deviation have the dimensions of a 
length, and it is only necessary to divide them by another factor which has 
the same dimensions ; e.g. 

Mean deviation Mean deviation , Standard deviation 

Mean ’ Mode 811 Mean 

are all of the required type. 

Coefficient of Variation. 

8.28. The last-mentioned in the foregoing paragraph in a modified 
form is the only coefficient which has come into general use. We define 

the Coefficient of Variation, v, as 

V = 100 M .(8.W) 

This coefficient lias been used by Karl Pearson in comparing the relative 
variations of corresponding organs or characters in the two sexes, and more 



150 


THEORY OF STATISTICS. 


recently by G. S. Wilson in researches on the bacteriological grading of 
milk (ref. (159)). 


Reduction of Frequency-distribution to Absolute Scale. 

8.29. Comparability of form may, however, be reached in a different 
way; that is to say, by regarding a itself as a unit and expressing other 
measures in terms of it. Thus, in the height distribution of Example 
8.2, a = 2*57 inches, or 1 inch = 0-389 a. Hence the intervals are 0*389 a 
in width, and run: 57 x0*889 a - , 58 * 0*389 a -, etc.: i.c. 22*178 <7-, 
22*502 a -, etc. 

A distribution expressed in this way has unit standard deviation, for 


1 

iVW 


1 

o ? N 

1 




o“ 

0“ 


The distribution reduced to the scale of a may thus he regarded as 
expressed in 44 absolute ” units, and two distributions expressed in this way 
may readily be compared as regards form, but not as icgards dispersion, 
for this has been made the same in the two eases. 


Deciles and Percentiles. 

8.30. Wc may conclude this chapter by describing briefly methods 
which have been much used in the past in lieu of the methods described 
in this and the preceding chapter. 

Instead of dividing the total frequency into 4 parts by quartiles, we 
may divide it into 100 parts by what are called percentiles. Or we 
rnav divide into 10 parts by deciles. The theory of these quantities is 
precisely analogous to that of the quartiles: there may, for instance, 
be certain indeterminaeies in their exact delinition which are removed 
by supplementary conventions ; they can be obtained by arithmetical or 
graphical interpolation; and they have simple and obvious meanings. 

Quantities Mi'll as quartiles, deciles, etc., which divide the total fre¬ 
quency into a number of parts, are called grades, and when we speak of the 
grade of an individual we mean thereby the proport ion of t he total frequency 
which lies below it. Conventionally, half the individual is regarded as 
lying above, and half below, the point determined by the variate value 
which it bears. 

8.31. The values of the percentiles may be used to draw what is 
known as Galton’s ogive curve. In fig. 8.2 we have plotted the 100 
grades along the horizontal against the height corresponding to any 
given percentile up the \ ertieal, for the height distribution of Example 8.2. 
The curve shows what percentage of the universe falls below any speeilied 
height. 

8.32. An extension of the method to the treatment of non-measurable 
characters has also become of some importance. For example, the 
capacity of the different boys in a class as regards some school subject 
cannot be directly measured, but it may not be very difficult for the 
master to arrange them in order of merit as regards this character : if the 
boys are then 44 numbered up ” m order, the number of each boy, or his 
rank, serves as some sort of index to his capacity ( cf .1 the remarks in 
8.14). It should be noted that rank in this sense is‘not quite the same as 



MEASURES OF DISPERSION. 


151 

grade; if a boy is tenth, say, from the bottom in a class of a hundred 
his grade is 9*5, but the method is in principle the same as that of grades 
or percentiles. The method of ranks, grades or percentiles in such a 
ease may be a very serviceable auxiliary, though, of course, it is better if 
possible to obtain a numerical measure. But if, m t he ease of a measurable 
character, the percentiles are used not merely as constants illustrative of 



O JO 20 30 40 50 GO 7O SO 90 WO 

Stature corresponding to each grade , 
for adult males ui the British Isles 


Fio. 8.2.—Ogive Curve for Stature (same data as tig. 6.6, p. 05). 

certain aspects of the frequency-distribution, but entirely to replace the 
table giving the frequency-distribution, serious inconvenience may be 
caused, as the application of other methods to the data is barred. Given 
the table showing the frequency-distribution, the reader can calculate 
not only the percentiles, but any form of axerage or measure of dispersion 
that has yet been proposed, to a sufficiently high degree of approximation. 
But given only the percentiles, or at least so ft \v of them as the nine 
deciles, he cannot pass back to the frequency-distribution, and thence to 
other constants, with any degree of accuracy. In all ease's of published 
xvork, therefore, the figures of the frequency-distribution should be given ; 
they are absolutely fundamental. 

SUMMARY. 

1. The standard deviation a is defined by 

where x is the deviation from the arithmetic mean, o- 2 is called the 
variance.” 

2. The root-mean-square deviation s about a point A is defined by 

i s (£ 2 ) 

where f is the deviation from A . 



152 


THEORY OF STATISTICS. 


8. If M - A — d, 1 hen 

& 2 -cx 2 K/ 2 . 

4. For grouped data the variance should he corrected by subtracting 
— 9 where h is the width of the class-interval, provided that (a) the 

J A 

frequency is continuous, and (b) that it tapers off to zero in both directions. 

5. The s.d. is the minimum root-mean-square deviation. 

0. The mean deviation is defined as 

m.d. - 

7. The m.d. is a minimum about the median. 

8. The quartiles arc the values of the \ariate which dnide the total 
frequency into 4 equal parts ; similaily, the deedes divide it into 10 equal 
parts and the percent lies into 100 equal parts. 

0. The quartik deviation, or semi-mtcrquartile range, is defined as 

Q " Ql 

^ 2 

10. For symmetrical or moderately skew distributions, 

m.d. ~0»8<7 and (£ ~0*G7cr approximately. 

11. For the majority of such distributions 90 per cent, of the total 
frequency lies within a range of 0o\ 7*5 m.d. oi 0Q. 


EXERCISES. 

8.1. Verify the following foi the data of Table 0.7, page 94 (in continuation 
of the work of Exeieise 7.1): - 


Staluie in Inches for Adult Males bom in 



England 

Sc otland. 

Wales 

1 1 eland. 

Standard deviation (uncorrc eted) 

2 r>(> 

2 50 

2 35 

2 17 

Mean deviation .... 

2 05 

1 95 

1 82 

1 69 

Quart lie deviation .... 

1 78 

1 56 

1 46 

1 35 

Moan deviation/standard deviation 

0 80 

0 78 

0 78 

0 78 

Quart lie dcviation/standaid deviation 

0 09 

0 62 

0 62 

0 62 

Lower quart do. 

65 55 

66 92 

65 06 

66-39 

Upper .. 

69-10 

70 04 

67-98 

69-10 


8.2. Find the standard deviation, mean deviation, qunrtiles and semi- 
interquartile range foi the data in the last column of the table of Exercise 6.G, 
page 111 (in continuation of the work of Exercise 7.3). 

Compare the ratios of mean and quartilc deviations to the standard devia¬ 
tion with those stated in 8.21 and 8.24 to be usual for moderately skew 
distributions. 



MEASURES OF DISPERSION, 


158 

8.3. Using, or extending if necessary, your diagram for Exercise 7.5, page 132, 
find the median and upper quartile for incomes subject to sur- or super-tax. 

Find also the Oth decile (the value exceeded by 10 per cent, of incomes only). 

8.4. Find the quartiles of the distribution of Australian marriages given in 
Example 8.8, and find the semi-interquartile range. 

8.5. Find directly the standard deviation of the natural numbers from 1 to 10, 
and hence verify equation (8.10). 

8.6. Show that, for any distribution, the standard deviation is not less than 
the mean deviation about the mean. 

8.7. Show that, for a J-shaped distribution with the maximum frequency 
towards the lower values of the variate, the median is nearer to Q l than to Q v 

8.8. Find the mean and standard deviation of the following numbers (1) with¬ 
out further grouping, (2) grouping the numbers by fives (40-, 45-, 50-, etc.), 
(8) grouping by tens (40 , 50-, etc.):— 

40, 43, 43, 46, 46, 46, 54, 56, 59, 62, 64, 64, 66, 66, 67, 67, 68, 68, 

69, 69, 69, 71, 75, 75, 76, 76, 78, 80, 82, 82, 82, 82, 82, 83, 84, 

86, 88, 90, 90, 01, 91, 92, 95, 102, 127. 

8.9. Apply Sheppard’s correction to the standard deviations calculated in 
Exercises 8.1 and 8.2 above. 

*8.10. (Continuing Exercise 7.9, p. 132.) Supposing the frequencies of values 
0, 1, 2, 3, . . . of a variable to be given by the terms of the binomial series 

n(n- 1) 

q\ nq n ~ l p, ~ 1 2 q«~*p\ . . . 

where p q~ 1, find the standard deviation. 

8.11. (Cf. the remarks at the end of 8.20.) The sum of the deviations (with¬ 
out regard to sign) about the centre of the class-interval containing the mean 
(or median), in a grouped frequency-distribution, is found to be S. Find the 
correction to be applied to this sum, in order to reduce it to the mean (or median) 
as origin, on the assumption that the observations are e\enly distributed over 
each class-interval. Take the number of observations below the interval 
containing the mean (or median) to be n 1% in that interval n & and above it n at 
and the distance of the mean (or median) from the arbitrary origin to be d. 

8.12. (W. Seheibner, “Ueber Mittelwerthe,” Berichte der kgh sdchsischen 
Gesellschaft d. Wissc?ischaftcn , 1873, p. 564, cited by Fechner, ref. (103): the 
second form of the relation is given byG. Duncker (“Die Methode der Variations - 
statistik ”; Leipzig, 1899) as an empirical one.) Show that if deviations are small 
compared with the mean, so that (x/M)* and higher powers of xjM may be 
neglected, we have approximately the relation 



where G is the geometric mean, M the arithmetic mean and a the standard 
deviation: and consequently to the same degree of approximation M 2 ~G 3 - a a . 

8.13. (Seheibner, loc . cit .) Similarly, show that if deviations are small 
compared with the mean, we have approximately 

H being the harmonic mean. 

8.14. Find the coefficients of variation of the height distributions of Exercise 
8.1 (using the uncorrected values of the s.d. as given). 

8.15. Show that if a range of six times the standard deviation covers at 
least 18 class-intervals, Sheppard’s correction will make a difference of less than 
0*5 per cent, in the uncorrected value of the standard deviation. 



CHAPTER 9. 


MOMENTS AND MEASURES OF SKEWNESS 
AND KURTOSIS. 

Moments. 

9.1. In considering the calculation of the mean and Ihc root-mean- 
square deviation we have defined, in passing, the quantities - S(/£) and 

as the first and second moments about the value A , £ being as 

before the value X - A , i.e. the excess of the variate value X over the value 
A . The first moment about the mean is zero, and the second moment 
about the mean is the variance (8.5). 

In generalisation of these definitions we now define the «th moment 
about A as where 

.... (9.1) 

The moments about the mean, which arc of particular importance, 
we write without dashes, so that 

Hn=rf S (fi n ) .... (9.2) 

From these definitions we have : 

Ha = Po ~ f) = 1 since £}' and - 1 

Mi'-^s(/0 

Hi =-0 

These results we have already seen. 

9.2. The word “ moment ” derives from Statics, and we may direct 
the attention of the student who is familiar with moments of forces to the 
fact that the sum S(/£") is divided by N in the definition above. This 
amounts to a slight departure from the Statical practice, and some writers 
refer to what we have called “ moments ” as “ moment-coefficients ” in 
order to keep this fact in mind. In Statistics, however, no confusion is 
likely to arise from the use of the briefer form “ moments.” 

The expression “ moments ” is also used by some writers to denote 
exclusively the moments about the mean, except in the case of the first 

154 



moments and measures of skewness and kurtosis. 155 

moment, which is zero about the mean, and which, therefore, is under¬ 
stood to be related to the origin under consideration at the moment. 
Wc shall not adopt this practice. 


Moments about the Mean in terms of Moments about Any Point. 
9,3. We have, by definition, 


Hence, 

and 


£ = X-A~(X-M) + (M-A) 
= t r + d 

ft 

S( ft) -&{f(* + d)”} 


Now, by the binomial theorem, 

(j' + d ) 71 -,i w + w ( , 1 rfir w 1 + v C 2 d 2 v n ~ 2 + . . . +d n 

Hence, 

$(ft)^S(fi”)+ n {\dS(fv" l H t? C 2 d 2 S(/Ir n ~ 2 ) i- . . . 4-d w S(/*) 
Dividing by N we get: 


H* f* n 4 . . . + d n . . (9.3) 


Similarly, 

S(/*")«Sf/(f-<*)*} 

and 

fi n - ,a w ' - , + »(V V« ->"•'• *■ ( -1 ) n rf w . (9.4) 

Those useful relations express the moments about the mean in terms 
of those about an arbitrary point rt, and vice versa. 

In particular we have: 


If« 1, 

/a/ ^/i t 4 d-d from (9.3) 

Hi Hi -d -0 from (9.4) 

which are simply the relation M - A d in another form. 


If n ~~2, 

Hz ~/a 2 4 2d hi t d 2 from (9.3) 

— + d 2 — a 2 f d 2 

Hi - /x 2 ' “ 4- d 2 from (9.4) 

- /x 2 ' - 2d 2 4- d 2 

These are the relation /a/ -=cr 2 4-d 2 . 

If n -3, 

/a/ -= /a i} 4- Sdni 4- 3^^ 4- d 3 from (9.3) 

— /a 3 f 3d/A 2 4- d 3 ...... (9.5) 

/a 3 =■ /a 3 ' - 3d/A 2 / 4- 3d 2 Hi -d 3 from (9.4) 

=- Hz - Sdni 4 2d 3 .(9.6) 



150 


THEORY OF STATISTICS. 


If u * 4, 

fJL^ ~ f X 4 + 4djtXg 4- 4* 4d- 3 jtXj t d* 

— /x 4 + 4d/x 3 + (Sd 2 fi t + d 4 . 

/x 4 =- /x 4 ' - 4d/x/ f Od 2 /x 2 ' -4rf 3 /x/ -f d 4 

— /x 4 ' - 4d/x 8 ' + 6d 2 fji z ' - 8d 4 


from (9.8) 

. ( 9 . 7 ) 

from (9.1) 

. (9.8) 


Calculation of Moments. 

9.4. The calculation of moments of the third and higher orders is 
similar to that of the first and second. For grouped data we regard the 
observations as concentrated at the mid-points of the intervals ; we choose 
a convenient arbitrary origin A, find the moments about it and use the 
relations (9.8) and (9.4) above to find the moments about the mean ; we 
use a cheek on the arithmetic similar to that of 8.10 ; and we have under 
certain conditions certain Sheppard corrections lor grouping. 

In practice we rarely require to ascertain moments higher than the 
fourth. Indeed, moments of higher orders, though important in theory, 
are so extremely sensitive to sampling fluctuations that values calculated 
for moderate numbers of observations are quite uniehable and hardly ever 
repay the labour of computation. 

9.5. There are various checks in use for the arithmetic of calculation. 
We shall use a generalisation of the simple identities of 7.12 and 8.10. 
In fact, we have 

(£ + 1)*-f # + 8f*+3f 4 1 

and hence, 

Hftt +1) 3 ) - sue) + 3S(/£*) +8S(/£)+# 

Similarly, 

s {/a f 1 > 4 ] = s( ie) +4S( f?) +es( ie) »4S( a )+n 


and so on. 

Thus, in calculating S(/£«) we also find Sf f(g hi)"}. and this, together 
with the sums of lower orders, will give us a ready cheek on the work. 

This check is sometimes known as the Charter check, after C. V. L. 
Charlier, the Swedish Statistician. 

Example 9.1 .— Continuing our work on the height distribution of 
Table 6.7, page 94, let us find the third and fourth moments of the distribu¬ 
tion about the mean. 

In almost all practical work we require the first and second moments 
as a matter of course. It is therefore best to proceed systematically in 
the computation of the various moments by setting out the arithmetic in 
tabular form as on opposite page. 

From this table we have : 

&U£) = 8,768- 8,584= 179 

S(/C) = 56,809 

S(yp) = 119,891 -117,622= 1,769 

S (/C) =1,182,061 



MOMENTS AND MEASURES OF SKEWNESS AND KUBTOSIS. 157 


Calculation of First Four Moments of the Distribution of Heights 
of Table 6.7, p. 94. 


Height, 
inches. 1 

/• 

f. 

/£• 

fu+iy 

/?■ 

/<*+!)*. | 

/f 3 - 

/<f + 1) 3 - 

/£*• 

/<£ i I) 4 - 

57- 

2 

-10: 

- 20 

- 18 

200 

102 

- 2,000 

- 1,458 

20,000 

13,122 

58- 

4; 

- 9 

- 30 

- 32 

321 

250 

- 2,910 

- 2,048 

20,244 

16,384 

59- 

H 

- 8 

- 112 

- 98 

890 

080 

- 7,108 

— 4,802 

57,344 1 

33,614 

(10- 

41 

- 7 

— 281 

- 240 

2,009 

1,470 

14,063 

- 8,850: 

98,441 

53,136 

61- 

83; 

— (» 

- 498 

- 415 

2,988 i 

2,075 

17,928 

-10,375! 

107,508 

51,675 

62- 

109 

- 5 i 

845 

- 070 

4,225 

2,704 

- 21,125 

-10,810) 

105,625 | 

43,264 

63- 

394 1 

~ 4 

1,570 

-1,182 

0,304 

3,540 

- 25,216 

-10,638! 

100,864 ! 

31,914 

(J4- 

009j 

- 3 

-2,007 

-1,338 

6,021 

2,070 

- 18,003 

- 5,3.52 

54,189 

10,704 

05- 

990 j 

—- 2 

- 1,980 

- 990 

3,900 

990 

- 7,920 j 

— 990 

15,840 

990 

Cl)— 

1,223 

- i! 

i 

-1,223 

-4,995 

1,223 

— 

- 1,223 

-55,335 

1,223 

— 

67- 

1,329 

0 

-8,584 

1,329 

— 

1,329 

-117,622 

1,329 

— 

1,329 

68- 

1,230 

i 

1,230 

2,400 

1,230 

4,920 

1,230 

9,840 

1,230 

19,680 

09- 

1,063 

2 

2,120 

! 3,189 

4,252 

9,507 

8,504 

28,701 

17,008 

86,103 

70 

(.40 

3 

1,938 

2,584 

5,811 

10,330 

17,442 

41,341 

52,326 

165,376 

71- 

392 

4 

1,508 

1,960 

0,272 

9,800 

25,088 

49,(XX) 

1 100,352 

245,(XX) 

72- 

202 

r> 

1,010! 

1,212 

5,050 

7 272 

25,250 

| 43,032 

| 126,250 

2ol,792 

73- 

79 

! 0 

474 

553 

2,814 

3*871 

37,OG4 

1 27,097 

| 102,384 

189,079 

71 

32 

! 7 

224 

256 

1,568 

2,0-18 

10,970 

| 16,381 

1 76,832 

131,072 

75 

U> 

! 8 

128 

141 

1,024 

1,290 

8,192 

! 11,061 

65,536 

104,970 

7(> 

ft 

9 

45 

50 

105 

500 

3,015 

j 5,000 

i 32,805 

50,(KX) 

77 

2 

10 

* 20 

22 i 

200 

212 

2,000 

[ 2,692 

20,(XX) 

29,282 

r J ot<il 

8,585 


8,703 

! 33,759 

I 

jf>6,8()9 

05,752 

119.391 

230,653 

1,182,061 

I 

1,539,292 


As a check on S(/£ 3 ) we have: 

&(/P)+*$(#*)+ss(j£)+x 

=-1,769 + 170,427 +537 +8,585 
-- 181,318 
-S{/(£ + l)»} 

As a check on S( /£ 4 ) we have : 

WSPi+W/f 3 ) i 4S(/f)+N 

- 1,182,061 f- 7,076 + 340,851+716+ 8,585 

- 1,539,292 

-SI./U + 1) 4 } 

We lm c then : 


= ~s,585 ~ 

0*020,850,32 

, 56,809 

>H 8,585 

6*617,239,87 

- l’- 769 

rs 8,585 

0-206,057,08 


, 1,182,061 
^ ‘ "8,585 ' 

« 6*616,805 


137*689,108,91 





158 


THEORY OF STATISTICS. 


From equation (9.6): 

jU-g ~ Sdp 2 -f" 

-0-206,057,08 -0-418,914,67 +0-000,018,13 
= -0-207,839 

From equation (9.8): 

/x 4 - (it - 4 dfjLz 4 6d 2 /x 2 ' - 3d 4 

—137-689.108,91 - 0-017,184,24 + 0-017,260,51 0-000,000,57 

—137-689,185 

which gives us p 2 , /x 3 , /x 4 in units based on class-intervals, i.e. inches. 

Example 9.2.- To find the moments about the mean of the distribution 
of Australian marriages of Table 6.8, page 96. 

Until the last stage we work in class-intervals of 8 years. As in 
Example 8.8, page 140, we take a working mean at 28-5 years. 


Calculation of Tin: First Four Momlnis of the Distribution of Marriages 
of Table 0.8, p . 90. 


Mid- 











value 











of 

Inter- 

/* 

t< 

ft- 

m 1 D 

ft 

/(H D*. 

ft*- 

/«+!)’• 

ft*- 

/(H !>• 

vain, 











Years. 











1(5-5 

294 

-4 

- 1,176 

- 882 

1,704 

2,646 

- 18,816 

- 7,938 

75,261 

23,814 

J 9 5 

10,995 

-3 

— 32,985 

21,990 

98,1*55 

43,980 

296 863 

- 87,960 

840 59 > 

175,920 

22 5 

(>1,001 

-2 

— 122,002 

-01,001 

211,001 

61,00] 

- 188,008 

61,001 

976,016 

61,001 

25-5 

73,054 

-1 

- 73,054 

-83,873 

73,051 


- 73,054 

-156,899 

73,054 

- 

28 5 

5G,50l 

0 

-229,217 

50,501 

- 

66,601 

- 876,7 41 

66,501 

” 

56,501 

31 ft 

33,478 

1 

33,178 

66,950 

33,478 

133,912 

38,478 

267,821 

33,478 

535,648 

84-5 

20,569 

2 

41,138 

61,707 

82,276 

186,121 

164,662 

556,36 5 

329,10 4 

1,666,089 

37 5 

14,281 

3 

42,843 

57,124 

128,529 

228,196 

,>85,5X7 

913,98 4 

1,156,761 

3,655,936 

40*0 

9,320 

4 

37,280 

46,000 

119,120 

233,000 

596,480 

1,165,000 

2,385,920 

5,825,000 

43 5 

6,230 

ft 

31,180 

37,416 

156,900 

22 4,496 

779,500 

1,346,976 

1 3,897,500 

8,081,866 

4(5*5 

4,770 

6 

28,020 

33,390 

171,720 

2.n,730 

1,030,320 

1,636,110 

6,181,920 

11,452,770 

49 5 

3,020 

7 

25,340 

28,900 

177,380 

231,680 

1,241,660 

1,863,410 

8,691,620 

11,827,520 

52 5 

2,140 

8 

17,620 

19,710 

140,160 

177,390 

1,121,280 ! 

1,596,510 

8,970,210 

14,368,590 

56*5 

1,055 

9 

14,895 

16,550 

134,055 

165,500 ! 

1,206,495 

1,655,000 

10,858,455 

16,550,000 

58*5 

1,100 

10 

11,000 

12,100 

110,000 

133,100 j 

1,100,000 

1,464,100 

11,000,000 

16,105,100 

61*6 

810 

11 

8,910 

9,720 

98,010 

116,640 1 

1,078,110 

1,399,680 

11,859,210 

16,79(5,160 

64*5 

649 

12 

7,788 

8,437 

93,456 

109,681 

1,121,472 

1,425,853 

13,457,664 

18,536,089 

07*5 

487 

13 

6,331 

6,818 

82,803 

95,152 

1,069,939 

I 1,336,.>28 

13,909,207 

18,708,592 

70 5 

320 

14 

4,564 

4,890 

63,896 

73,360 

894,544 

1,100,250 

12,523,616 

16,503,750 

73 5 

211 

15 

3,165 

3,376 

47,475 

54,016 

712,125 

864,256 

10,681,875 

13,828,096 

76-5 

119 

16 

1,904 

2,023 

30,461 

34,391 

487,424 

584,647 

7,798,784 

9,938,999 

79*5 

73 

17 

1,241 

1,314 

21,097 

23,652 

358,649 

425,736 

6,097,033 

7,663,218 

82 5 

27 

18 

486 

613 

8,748 

9,747 

157,464 

185,193 

2,834,352 

3,518,667 

85*5 

14 

19 

260 

280 

5,051 

5,600 

96,026 

112,000 

1,824,494 

2,240,000 

88 5 

5 

20 

100 

105 

2,000 

2,205 

40,000 

46,305 

800,000 

972,405 

Totals 

301,786 

! 

318,049 

474,490 

2,156,838 | 

2,636,287 

13,675,105 

19,991,056 

137,306,162 

202,091,761 

l 


From this table we have : 


S(/f) - 318,049 -229,217--- 88,832 

S(./? 2 ) - 2,155,838 

S( /£ 3 ) -13,675,105 - 876,743 - 12,798,362 

is (ft*) =187,806,162 




MOMENTS AND MEASURES OF SKEWNESS AND KURTOSIS. 159 

As a check on S(/£) we have : 

S(/£) + iV -88,832 + 301,785 — 390,617 
-S{/(| + 1)} 

Similarly, for S(/£ 2 ) : 

S(/£ 2 ) + 2S( f(j) +N -2,155,838 4-177,064 +301,785 
— 2,635,287 
-S{/(f+1) 2 ) 

As a check on S(/f 8 ) : 

S(/D t3S(/£ 2 )+3S(/£)+# 

-12,798,362 +6,467,514 + 266,496 +301,785 
-19,834,157 

-S{/(f + l)*j 
As a check on S(/£ 4 ) : 

S(/f) 4 4 S(jr) 4 «S(/f«) +4S(/?) +N 
-- 137,306,162 +51,193,448 +12,935,028 +355,328 +301.785 
- 202,091,751 

Hence, about the working mean : 


d - /ij' 

88,832 

301,785 

- 0*294,355,253 


2,155,838 

301,785 

- 7*113,622,115 

/V 

12,798,362 

301,785 

- 42*408,873,867 


137,306,162 

154*980,075,219 

301,785 


For moments about the mean: 

/x 2 ~ /x 2 ' -d 2 -7 056,977 

/x a -/x 3 ' -3df/t/ i 2d 3 -36 151,595 

/X4 -/// - ld/i/ T CiV “3d 4 --408*738,210 

These arc expressed m class-intervals, which are units of three years. 
If, as we rarely do, we wish to express the results in other units, say one 
year, we must multiply the first moment by 3, the second by 3 2 , the third 
by 3 3 , the fourth by 3 4 , and so on ; e.g. 

fi 2 - 7*056,977 x 9 - 63*512,79 

In this and the preceding example we have retained more digits than 
are probably necessary, but the student will find it as well to retain several 
more than appear to be required, since subsequent work involving multi¬ 
plication or addition may otherwise throw' doubt on the final figures. 

9.6. It will be evident that the labour involved in calculating the 
third and fourth moments is very considerable. Calculating machines 



160 


THEORY OF STATISTICS* 


Or tables of powers are a great help, and certain tables for the specific 
purpose of computing moments will be found in “ Tables for Statisticians 
and Biometricians , Part The student should familiarise liimself with 

the methods given in the two examples above, since, although we shall not 
use them to any great extent in this book, moments are important in 
more advanced theory. 

Sheppard Corrections for Moments. 

9.7. As in the case of the second moment, the effect due to grouping 
at mid-points of intervals may be corrected for by formula 4 due to W. F. 
Sheppard, from whom they derive their name. The formula* for the 
second, third and fourth moments are as follows:— 

P 2 (corrected) =/x 2 ) 

p 3 (corrected) = p 3 l . . (9.9) 

7 

p 4 (corrected) = p 4 - \h*p 2 4 (>4() A 4 1 

where li is the width of the class-interval. If we are working m class- 
intervals as units, h is taken to be unity. 

The use of these formula 4 is restricted to the cases which we mentioned 
in 8.11, i.e. those in which (a) the frequency-distribution is continuous, 
and (b) the distribution tapers off to zero in both directions. 

Example 9.3 .—In Example 9.1 we found : 

4 /Xg — 6*616,805 

p 3 - - 0*207,839 
p 4 -=137-689,185 

Applying the above corrections, h being 1 : 

p 2 (eorr.) - 6*616,805 - 0*088,883 

- 6*533,472 
p 3 (eorr.) =- -0*207,839 

p A (eorr.) - 137*689,185-3*308,402 +0*029,167 
-134*409,950 

Example 9.4 .—In Example 9.2 we have, in units of 3 years: 

7*056,977 
p 3 -~ 36*151,595 
p 4 = 408*738,21 

Thus : 

Pi (eorr.) - 7*056,977 - 0*088,333 

- 6*973,644 
p 3 (eorr.) = 36*151,595 

p 4 (eorr.) - 408*738,210 -3*528,489 4 0*029,167 

- 405*238,888 

5 In units of one year the corrected moments are given by multiplying 
by 9, 27 and 81 as before. ** 



moments and measures of skewness and kurtosis. 161 


f}» and y - Coefficients. 

9.8* Certain quantities calculated from the moments about the mean 


are of particular importance in statistical work. 

We define 



. 

(9.10) 

O M4 

ft- 2 • 

p2 

and two further quantities : 

. 

(9.11) 

Yi ~ + ^ Pi 

. 

(9.12) 

„ o » M.-'W 

Yi - Pi - 6 - i 

2 

. 

(9.13) 


The reason for the introduction of these arbitrary-looking quantities will 
appear in the sequel. 1 

It is to be noted that these four eoellicients are all pure numbers and, 
as such, are independent ot the scale of measurement of the variable ; for 
since fx n has the dimensions of (variable) n , fx j 2 has the dimensions (variable) 6 
and so has fx/, and hence their quotient lias dimension zero, i.e. is a pure 
number ; and similarly for the quotient of /x 4 and p, 2 2 . 

Example 9.o .—Let us calculate and for the distribution of 
Example 9.1. 

We have, using the corrected values of Example 9.3 : 

O / X * 2 

Pi - pi 

_ ( -0-207839) 2 
(0 533472) 3 

018,07 -0-000155 
278-889 


Ps “ 2 

/^2 

134* 10995 
42-68662 
-3-149 

Example 9,6. —Similarly, in the data of Example 9.2, using corrected 


values: 


(36-151595) 2 
P l ~ (0*973644 ) 3 
— 3-854 
405*238888 
(6-9736 i4) 2 
8*333 


P, 


1 In general, Karl Pearson defines 


H-xHin * 8 
- /!,*+» 

__ f l 2n- 1-2 

~t*t n+l 


11 



162 


THEORY OF STATISTICS. 


It should be noted in this last example that, since the coefficients are 
pure numbers, it does not matter whether we work in units of three years 
or of one year. 


Measures of Skewness. 

9.9. The departure of a frequency-distribution from symmetry has a 
certain interest, and several measures have been devised to permit of the 
measurement of this skewness. Such measures should (a) be pure numbers, 
so as to be independent of the units in which the variable is measured, and 
(&) be zero when the distribution is symmetrical. 

9.10. Three such measures deserve mention. In the lirst place, we 
can deline 

Sk«.+_«. 


This can be put in the form : 


Skewness — 


ft) 


(0.15) 


i.e. the skewness is taken to be the difference oft he quartile dev lations from 
the median divided by their sum. It is clearly a pure numbei, for both 
numerator and denominator have the same dimensions, and it is zero when 
the distribution is symmetrical. It varies from -1 to + 1. 1 

This is a rather rough-and-ready measure which might, however, be 
useful if wc were using the semi-interquartile range as a measure o r is- 
persion and were unable or unwilling to calculate the standard d< v m m. 

9.11. The most common measure of skewness is Pearson's, d * i by 


Skewness 


Mean - Mode 
Standard deviation 


M-Mo 

a 


V 9.16) 


This evidently is a pure number and is zero for symmetrical dis¬ 
tributions. 

9.12. The calculation of this coefficient of skewness is subject to the 
inconvenience of determining the position of the mode. We may circum¬ 
vent this difficulty in several ways. In the first place, for distributions 
which are obviously not too skew we may use the empirical relation of 7.27. 
We then have: 


Skewness — 


3(Mcan - Median) 
Standaid deviation 


(9.17) 


Secondly, for a large class of curves to which the moderately skew 
humped curve is a close approximation, the skewness of equation (9.16) 
is given exactly by 


Skewness - 


V?i(&+3) 

2(5& -60,-9) 


(9.18) 


1 In the 10th and previous editions of 1 lus book the measure Skewness - ^ ^ - 

SC 

was suggested, i.e. twice the measure (9.14). The above form has the advantage that 
its limits are -land +1. 



moments and measures of skewness and kurtosis. 163 

We may, therefore, take this to be an approximation to the value given by 
equation (9.16). 

It should be noted that the measures (9.14) and (9.16) are positive if 
the longer tail of the distribution lies toward the higher values of the 
variate (the right) and negative in the eontrary case. This accords with 
the anticipatory remarks of 6.20. The measure (9.18) is to be regarded 
as without sign. 

Limits of the Measures of Skewness. 

9.13. We have already remarked that the measure given by equation 
(9.14) lies between -1 and 4 1 . There is no limit in theory to the measure 
(9.16) or its approximation (9.18), and this is a slight drawback. But 
in practice the value given by equation (9.16) is rarely very high, and for 
moderately skew single-humped curves is usually less than unity. 

it has been shown that the quantity < f 11 . /-. ia ' U lies between 

* standard deviation 

the limits -1 and +1, and the measure (9.17) therefore lies between -8 
and 4 8 (see ref. (161)). In practice it rarely approaches these limits. 

Example 9.7. - Let us once again consider the height distribution of 
Table 6.7, which has been already discussed in this chapter (Examples 9.1, 
9.8 and 9.5). 

We have: 

Mean (Example 7.1, p. 116) 

S.d. (corrected, Example 8.4, p. 

Median (Example 7.8, p. 121) 

Q x (Example 8.9, p. 148) 

Qz (ibid.) 

Q (ibid.) 

ft (corrected, Example 9.5, p. 
ft (ibid.) 

The measure of skewness (9.14) is, then, 

Q x +Q,-2Mi 


= 67*46 inches 
141)- 2*56 inches 

- 67*47 inches 
=-65*71 inches 
= 69*21 inches 

- 1*75 inches 
i- 0*000155 

- 3*149 


161) 


Sk = 


2 Q 

65*71 4 69*21 - (2 x 67*47) 


2 x 1*75 

- - 0*006 

We can clearly place no reliance on this figure. The median and 
quart lies were obtained by methods of approximation which we cannot 
expect to give accuracy to the second decimal place. We can only 
conclude, therefore, that so far as the measure (9.14) is concerned, there 
is no significant skewness. 

The measure (9.18) gives : 

_0*0124 x 6*149 

“2(15*745 -0*001 -9) 

^0*0124 x 6*149 
2x6*744 
-0*006 




164 


THEORY OF STATISTICS. 


Here again the skewness is extremely small, and is, in faet, almost 
equal to the value given by (9.14). 

If we take the measure (9.17) we get: 

S(M-Mi) 

oh =S ~ -— 

a 

- 0 03 
2-56 

-- — 0*012 


This \aluc is suspect because we have determined the mean and the 
median onJy to the second decimal place, but clearly the value is small. 

Wc conclude that there is only veiy slight skewness. At this stage we 
cannot say whether such small skewness is significant, but it is at least 
probably attributable to sampling fluctuations 

Example 9.8 .—For the marriage (lata of Examples 9.2, 9.1 and 9.6 
it will be found that, using the working mean as origin : 


and 


Mean- 0*29 It 
Median - -0*4018 
_ -1-1,308 
Q 3 = 1*2310 

(7 (corrected) (Ex. 8.5) 2 0108 

ft -3*851 
ft - 8 333 


The measure (9.11) is : 


(Q, Mi) - (Mi - Q x ) 
(Q s ~Mi) 1 (AU-QJ 
1 * 0334-3 *0550 


1*0334 +1 0550 


0*5784 

2*6884 


= 0*22 


The measure (9.18) is : 

V3 854(11*333) 

^ 2(41*605 -23*121-9) 

1*903 x 11*333 
2 >9*511 
= 1*17 


The two arc \try diffeient, as we might expect, but both indicate 
strong positne skewness. As a matter of mtcicst we may compare the 
value (9.17), which gives 

3x0*0962 

2*0108 

-0*79 



moments and measures of skewness and kurtosis. 165 
Kurtosis. 

9.14. The coefficient £ 2 or its derivative y 2 is used to measure a 
property of the single-humped distribution known as kurtosis (wpros, 
humped). 

We take as the standard value of £ 2 the number 3, for reasons which 
will appear when we study the so-called “normal” curve (10.24). This 
curve is approximately of the shape given in fig. 6.5, page 93. Curves 
with values of £ 2 less than 3 will, compared with this, be liat-topped, 
and are called platykurtic (ttAutvs, broad, -Micros). Curves with values 
greater than 3 will be peaked more sharply, and are called leptokurtic 1 
(Ac7tto9, narrow, -f Kvpros). “Student” gives an amusing mnemonic for 
these names : Platykurtic* curves, like the platypus, are squat with short 
tails. Leptokurtic cur\ es are high with long tails like the kangaroo— 
noted for “ lopping ” ! 

Example 9.9. In the height distribution of Examples 9.1, 9.3, 9.5 
and 9.7: 

£,>-3149 

72“ ft-3-0-149 

ITcnce the curve is slightly leptokurtic. 

On the other hand, in the marriage distribution of Examples 9.2, 
O.t, 9.6 and 9.8 : 

ft 8-333 
y 2 — 5*333 

and the curve is very leptokurtic. 


Seminvariants. 

9.15. We may conclude this chapter by referring briefly to a set of 
quantities similar to moments which have some theoretical and practical 
importance. These are Thiele’s seminvariants. 

The seminvariants are defined by a rather complicated mathematical 
expression which we shall not here reproduce. For present purposes it 
is sufficient to note that the first four seminvariants may be expressed as 
simple functions of the first four moments. In fact wc have : 


— /x/ 

A 2 ~ p 2 "A 4 / 2 

^3“/V “tyiW ^ 2 /V 3 

K ~~IH “ 'V / 2 1 12 / 4 i' 2 / 4 / “ c / 4 i' 4 ^ 

In particular, about the mean. 


A x -0 
A 2 - /x 2 
A 3 

A 4 -/x 4 -3 i a 2 



(9.19) 


(9.20) 


1 These terms are due to Karl Pearson and appear to have been given for the first 
time in Biomelrika, vol. 4, 1905-6, page 160 et seq. By a slip Icptoknrtosis is there 
inadvertently applied to distributions for which fi z <3. 



166 


THEORY OF STATISTICS* 


9.16. These relations are used in the calculation of the seminvariants, 
the moments being first ascertained in the manner of the earlier sections 
of this chapter. For instance, the first four semin variants of the height 
distribution which has served us as an example are, about the mean, 

Ai-0 

A* — 6*610805 

A 3 - - 0-207830 

A 4 - 137-680185 -3 x (6-616805) 2 -6-34280 

if we take unconnected values of the moment s. 

9.17. The semin variants owe their name to two very remarkable 
properties. In the first place, all seminvariants except the first are 
independent of the origin of calculation. The moments vary according 
to the point about which they are calculated, which makes it necessary 
to specify the origin A in speaking of them. The seminvariants, on the 
other hand, do not, so that it is unnecessary to specify any value A in 
giving their values; the sole exception to this rule is the first semin variant, 
which is the same as the first moment. 

Secondly, if the scale of measurement of the variate is altered by 
multiplying all values by a constant r/, the nth seminvariant is multiplied 
by a n . Thus, in the height distribution, if we change our scale to centi¬ 
metres instead of inches, and so multiply all values of the variate by 2*54, 
the seminvariants in the previous section are to be multiplied by 2-54, 
2*54 2 , 2-54 3 , 2*51 4 , respectively. 


SUMMARY. 

1. The nth moment about the point A is defined as 

where i;=X ~ A, and X is the value of the variate. 

2. The nth moment .about the mean is written fi n . 

3. Pn ~ _ x h HWi-a 1 )" “d* 

where 

d^M~A 

and in particular 

' /4 ~ -f -d 3 

p 4 ' t l 4 ~ 4 0d 2 /z«/ - 3d 4 

4. Sheppard’s corrections for the moments are : 

fi 2 (corrected) = p 2 - ^ 

/x 3 (corrected) -p 3 

7 

p 4 (corrected) - |/? 2 /x 2 4 h* r 



moments and measures of skewness and kurtosis. 


167 


5. 


ft 


Ms 

w 

= Vft ^- 3 


R -P* 

P* “ .. 2 
^2 

v _o o_Mi- 3 Ms 2 


yi “ ’ M '”/x 2 f 
6, Pearson's measure of skewness is given by 


Sk = - 


Mean - Mode 


Standard deviation 


which, for a large class of curves, is equal to 

_Vft(ft +3)_ 

2(5ft-0ft-9) 

7. If the standard deviation is not known, a rough measure of skewness 
is obtained by taking 

ci, _ ^?i + Qi “ 

2Q 

8. Distributions for which fi 2 > arc said to be lcptokurtic ; those for 
which ft 2 are plntykurtic. 

1). The first four seminvariants, in terms of the moments about the 
mean, are : 

A t -0 

^2 f l 2 

* 5 /* ! 

A, /t 4 - 3/i 2 2 

/ 

10. The seminvariants arc independent of the origin of calculation, 
except the first, which is equal to the mean. 


EXERCISES. 

0.1. Find the first four moments about the mean of the distribution of 
males in the United Kingdom according to weight given in Exercise 0.6, page 111. 
(Correct your values for grouping.) 

Hence find [i x and and measure the kurtosis of the distribution. 

0.2. For the same distribution find the three measures of skewness, approxi¬ 
mating to the mode by the empirical relation of 7.27. 

0.0. Find the first four moments about the mean, the values of fi ly /3 2 , and 
the three measures of skewness for the following distribution (see table, p. 108). 
(Apply Sheppard's corrections.) 

9.4. In the data of Example 9.1, group the individuals by intervals of three 
inches (57-, 00 , etc.) and calculate the first four moments about the mean. 
Compare your results with those of Example 9.1, (a) before Sheppard’s corrections 
are applied, and (b) after Sheppard's corrections are applied. 

0.5. Find the third and fourth moments about the mean of the binomial 
series: 

w(w -1) 

ry”, my"" 1 /*, ~ - —q n - 2 y> a , . . . where p+q-l 

h 1 •“ 

(continuing the work of Exercise 8.10, p. 153). 




168 


THEORY OF STATISTICS 


4912 Cams Classified according to their Yield of Milk . (Data from J. F. Tocher, “An 
Investigation of the Milk Yield of Dairy Cows,” Biometrika , vol. 20B, 1928, 
pp, 305-244.) 


Yield of Milk 


Yield of Milk 


(gallons per week). 

Number of 

(gallons per week). 

Number of 

(Central Value of 

Cows. 

(Central Value of 

Cows. 

Interval.) 


Interval.) 


8 

1 

23 

214 

9 

5 

24 

153 

10 

13 

25 

112 

11 

33 

26 

58 

12 

71 

27 

35 

13 

151 

28 

13 

14 

230 

29 

15 

lf» 

339 

30 

4 

16 

499 

31 

5 

17 

552 

32 

2 

18 

585 

33 

1 

19 

586 

34 

l 

20 

496 



21 

448 

Total 

4912 

22 

284 


i 


9.6. The first four moments of a distribution about the value 4 are - 3 *5, 17, 
-30 and 108; find the moments about the mean and the origin. 

9.7. Show that for a symmetrical distribution all moments about the mean 
of odd order are zero. 

9.8. Show that for any distribution fi 2 ■> 1. 

9.9. Calculate the second, third and fourth seminvariants of the distribu¬ 
tion of Australian marriages of Example 9.2, (a) from the moments about the 
mean, using equation (9.20), and (b) from the moments about the value 28 5, 
using equation (9.39); and hence verify that the values of the seminvariants 
are independent of the origin of calculation, (Use uncorrected values of the 
moments.) 

9.10. Show that 

d - A, 


tf 



CHAPTER 10. 


THREE IMPORTANT THEORETICAL DISTRIBUTIONS — 

THE BINOMIAL, THE NORMAL AND THE POISSON. 

Theoretical Distributions. 

10.1. In the examples of frequency-distributions which we have given 
in Chapter 6 and subsequent chapters we have been careful to take 
data from observation and experiment. It is possible, however, starting 
with certain general hypotheses, to deduce mathematically what the 
frcquency-distri but ions of certain universes should be. Such distributions 
\vc shall call theoretical. 

10.2. There are three theoretical distributions which, from their 
historical interest as well as their intrinsic importance, occupy a position 
in the forefront of statistical theory. They are, in the order of their dis¬ 
covery, the Binomial (due to James Bernoulli, circa 1700), the Normal 
(due to Dcmoivre, but more often associated with the names of Laplace 
and Gauss, who discussed it at the close of the eighteenth and the beginning 
of the nineteenth centuries), and the Poisson (due to S. 1). Poisson, who 
published it in 1837). 

These three are, so to speak, the classical distributions. Certain others 
were discovered during the nineteenth century, blit it was not until the 
end of the century that there began the second period of statistical dis¬ 
covery which has since given us a wealth of theoretical distributions. Even 
this latest crop depends to some extent on the properties of the first three, 
and particularly of the Normal Distribution. The three therefore form, 
historically and logically, the starting-point of the theory of particular 
distributions, and in this chapter we propose to give an account of their 
main properties. 

The Binomial Distribution. 

10.3. If we may regard an ideal coin as a uniform, homogeneous 
circular disc, there is nothing which can make it tend to fall more often on 
the one side than on the other ; we may expect, therefore, that in any long 
series of throws the coin will fall w r ith either face uppermost an approxi¬ 
mately equal number of times, or with, say, heads uppermost approxi¬ 
mately half the times. Similarly, if we may regard the ideal die as a 
perfect homogeneous cube, it will tend, in any long series of throws, to fall 
with each of its six faces uppermost an approximately equal number of 
times, or with any given face uppermost one-sixth of the whole number of 
times. These results are sometimes expressed by saying that the chance 
of throwing heads (or tails) with a coin is 1/2, and the chance of throwing 
six (or any other face) with a die is 1/0. To avoid speaking of such 
particular instances as coins or dice, we shall in future, using terms which 

169 



170 


THEORY OF STATISTICS. 


have become conventional, refer to an event the chance of success of 
5vhieh is p and the chance of failure q. Obviously p +q = 1. 

10.4. We will now assume that the events in a number of trials are 
all independent, i.e. that the chances p and q arc the same for each event 
and remain constant throughout the trials. The case corresponds to the 
tossing of perfect coins or the throwing of perfect dice. 

Suppose now wc take a number of sets of n trials and count the number 
of successes in each set ; for example, we might toss a coin ten times for 
each set. and observe the number of heads in each set of ton. In general, 
there will be some sets with no successes, some with one success, some with 
two successes, and so on. Hence, if we classify the sets aeeording to the 
number of successes which they contain we shall get a frequency-dis¬ 
tribution. Table 6.1.5, page 107, gives such a distribution for some dice- 
throwing experiments. We shall now see how, on the assumption of 
independence of successive events to which we have just referred, the 
nature of this distribution may be theoretically determined. 

10.5. For the case of single events we expect in N trials to get Np 
successes and A Jq failures. 

Suppose now we take N pairs of events, i.e. two to the set. There will 
be Nq eases in which the first event is a failure, and, in virtue of the in¬ 
dependence of the ev ents, among these Nq there v\ ill be Nq < q failures, and 
Nq xp successes, of the second c\ cut on the a\erage. Similarly, of the Np 
cases in which the first event was a success, the second event will, on the 
average, be a success in Np xp and a failure in Np x q eases. Hence there 
will be Nq 2 cases in which both events are failures, 2Npq oases with one 
success and one failure, and Np 2 eases m which both are successes. 

If we now take N sets of three events, we see that, of the Nq 2 eases in 
which the first two events were failures, Nq 2 x q will give a third failure 
and Nq 2 xp one success; of the 2 Npq eases, 2Npq 2 will give two failures 
and a success and 2 Np 2 q one failure and two successes ; and of the Np 2 
cases, Np 2 q will give one failure and two successes and Np 3 will give three 
successes. lienee the number of sets with 3 failures, 2 failures and 1 
success, I failure and 2 successes, and 3 successes are, respectively, 

Nq 3 , SNq 2 p, 3 Nqp 2 , Np 3 

10.6. From these results it is evident that the frequencies of 0, 1, 2, 
. , . successes are given 

for one event by the binomial expansion of N(q ■+ p) 
for two events „ „ ,, N(q + p) 2 

for three events „ „ „ N(q + p) 3 


In general, for n events the frequencies of successes in N sets are given 
by the successive terms in the binomial expansion of N(q+p ) n , i.e. 


jv|<7 n f nq nl p 4 


ii (n -1 

1 . 2 


r 


2 p 2 


«(»-l)(«-2) 
l. 2 . :$ 1 


V- 




This is the so-called Binomial Distribution. 

Example 10.1. -If wc take 100 sets of 10 tosses of a perfect coin, in 
how many cases should we expect to get 7 heads and 3 tails ? 

Here />=•], q -- \ 



THE BINOMIAL DISTRIBUTION. 171 


Hence, the numbers of successes 0,1, 


10 arc the terms in 100(J + J) 1# , 


t.e. 


f/i \ 10 /iWi\ io. n/iwi \ 2 \ 

10 °U2^ +1 ’ \2/ '2' + l . 2 W '2' + ’ ' ’ / 


The term giving 7 successes and 3 failures is : 

100 x 10 C,(f) 7 (I ) 3 




1 

2 10 


3000 

250 


— 12 approximately 

Example 10.2 .—In the previous example, in how many eases should 
we expeet to get 7 heads at least ? As before, the numbers of successes 
are the terms in 


100 / 
2 30 1 


, ... 10.0 

1 4 10 H - a - + 


We require the sum of terms with 7, 8, 0, 10 successes. Our expected 
number is, then, 

i 2 ( ::v°c 7 +>»(’,+-*>(•,„! 

100(10.9.8 10.9 10 ( 

_ 2 10 1 1 .2.3 ^ 1.2 4 1 ‘ / 


100 


176} 


1100 

64 

— 17 approximately 

General Form of the Binomial Distribution. 

10.7. The form of the binomial distribution depends (1) on the values 
of p and q , (2) on the value of the exponent n. 

If p and q are equal the distribution is evidently symmetrical, for p 
and q may be interchanged without altering the value of any term, and 
consequently terms equidistant from the two ends of the series are equal. 

If, on the other hand, p and q are unequal, the distribution is skew. 
The following table shows the calculated distributions for n-~20 and 
values of p, proceeding by 0*1, from 01 to 0*5. When p - 0-1, eases of 
two successes are the most frequent, but eases of one success almost 
equally frequent : even nine successes may, however, occur about once 
in 10,000 trials. As p is increased, the position of the maximum frequency 
gradually advances, and the two tails of the distribution become more 
nearly equal, until p — 0-5, when the distribution is symmetrical. Of 
course, if the table were continued, the distribution for p ~ 0*6 would be 
similar to that for <y==0*6, but reversed end for end, and so on. 



173 


THEORY OF STATISTICS, 




Table 10.1 .—Terms of the Binomial Series JO,000 (q + p) %0 for Values of p 
from 01 to 0 5. (Figures given to the nearest unit.) 


Number of 

p^ 01 

p =0-2 

p = 0 3 

jp=0*4 

2>= 05 

Successes. 

<7 = 0*9 

<7=0*8 

?=0*7 

</=0*G 

q = 0*5 

0 

1216 

115 

8 



1 

2702 

576 

68 

5 

— 

2 

2852 

1369 

278 

31 

2 

3 

1901 

2054 

716 

123 

11 

4 

898 

2182 

1304 

350 

46 

r> 

319 

1746 

1789 

746 

148 

6 

89 

L091 

1916 

1244 

370 

7 

20 

545 

1643 

1659 

739 

8 

4 

222 

1144 

1797 

1201 

9 

1 

74 

654 

1597 

1602 

10 

— 

20 

308 

1171 

1762 

11 

— 

5 

120 

710 

1602 

12 

- 

1 

39 

355 

1201 

13 

— 

- 

10 

146 

739 

14 

— 

— 

o 

49 

370 

15 

— 

— 

— 

13 

118 

16 

— 

— 

— 

3 

46 

17 

18 

— 

— 

— 

_ i 

i 

11 

2 

19 

— 

__ 

— 


- 

20 

— 

— 

— 

— 

— 


10.8. If p~q, the effect of increasing n is to raise the mean and 
increase the dispersion. If p is not equal to q f however, not only does an 
increase in n raise the mean and increase the dispersion, but it also lessens 
the asymmetry ; the greater n , for the same values of p and q , the less the 
asymmetry. Thus, if we compare the first distribution of the above table 
with that given by n = l()0, we have the following:— 


Table 10.2. —Terms of the Binomial Series 10,000 (0-9 + 0*1) 1H0 . (Figures given 

to the nearest unit.) 


Number 


Number 


Number 


of 

Successes. 

Frequency. 

of 

Successes. 

Frequency. 

of 

Successes. 

Frequency. 

0 

— 

8 

1148 

16 

193 

1 

3 

9 

1304 

17 

106 

2 

16 

10 

1319 

18 

54 

3 

59 

11 

1199 

19 

26 

4 

159 

12 

988 

20 

12 

5 

339 

13 

743 

21 i 

5 

6 

596 

14 

513 

22 

2 

7 

889 

15 

327 

23 

1 


The maximum frequencies now occur for 9 and 10 successes, and the two 
“ tails ” are much more nearly equal. If, on the other hand, n is reduced 
to 2, the distribution is: 




THE BINOMIAL DISTRIBUTION. 


173 


Number of 
Successes. 

0 

1 

2 


Frequency. 

8100 

1800 

100 


and the maximum frequency is at one end of the range. 

The tendency towards symmetry may be seen from fig. 10.1, in which 



Fio. 10.1.~ Frequency-polygons of the Binomial (0*9 +0*1 ) w for Various Values of n. 

the binomial (0-9+ 0*1 ) u has been drawn for various values of n. See 
also 10.12 below. 


Constants of the Binomial Distribution. 

10.9. We proceed to find the lower moments of the distribution 
N (q+p) n * 

Taking an arbitrary origin at 0 successes, we have the successive 
deviations £ as 0, 1, 2, . . . n, and hence, 

Pi ~(q n x0) •+ x p x 1) + ( n Cfl n ~ 2 p 2 x2) + . . . + (p n xn) 

== / > { w (? n-1 +n(n - l)q n 2 p \ ... 

+ (n - l)^ n ~ 2 p p n ~ 1 } 

~np(q 

Now, = 1 

Hence, jz/ - 

That is, the mean M is up . 




174 


THEORY OF STATISTICS. 


We have, further, 

fi z ' — (q tl x 0) + (*C 1? »-x 1) 4 ( n C 2 q u 2 /> a x 2 2 ) 4- . . . + (p n x w a ) 

— f? J p{g n “ 1 + 2(n - l)q n ~*p 4 --^/ n ~ 3 p a 4- . . . 4 -up 11 ” 1 } 


The expression in brackets is the first moment of the binomial (q +p) n ~ 1 
about origin -1, and hence is equal to (n - l)p 4-1. 

Hence, 

P* ~np{(n-J)p 4-]} 

It may also be shown in a similar way (but we omit the proof) that 

^npi (w-l)(w-2)/i 2 4 3(w 1 )]H 1] 

/x/ — np[(n - 1 )(m -2)(n ~3)/r* 4 6(?t - 1)(« -2)// 2 4 7(>? - 1 )p 11} 


10 . 10 . 

We have: 


From these results we may Jmd the moments about the mean, 
P 2 ■= n„' - d - 

-np{(n- 1 )}) 4-1} ~n 2 p 2 

~p) 

-npq 


Hence we have the important result that 

o- _ \/npq . 

10.11. Similarly, it will be found that 


Hence, 


/*3 -»]></(<]-]>) 

fx^Qp-q-n- H pqn{ 1 - iipq) 

o P*_(g^P)~ 

Pl p/ npq 

j 8 i *^.-8 + 1 " 0w 


1“/ 


p<l n 


( 10 . 1 ) 

( 10 . 2 ) 

(10.3) 

(10.4) 

(10.5) 


10.12. Thus the binomial distribution has mean np and standard 
deviation V npq. It is mstruetivc to note that [3 L and (/2>-3) are both of 

order ~. lienee, as n becomes larger, the distribution tends to symmetry 
and zero kurtosis. 

The \ allies ot [3 1 and [3 Z tor some values of p and q and ranges of n are 
shown in Tables 10.3, 10.4 and 10.5. 

From an inspection ol these tables it will be seen that even for an 
extremely small value of p the binomial tends to zero f3 x and zero kurtosis 
for values of n well within practical limits. For the symmetrical binomial 
p=~q** 0-5, & is of course zero, and /J 2 rapidly approaches 3. 



THE BINOMIAL DISTRIBUTION* 


175 


Table 10*8 .—Values of ft and ft for the Binowied with p —0-02, q-0-98. 
(From M. Greenwood, Biometrika , vol. 9, 1918, p. 09.) 


n. 

ft- 

ft- 

100 

0 4702 

8-4502 

200 

0 2851 

8-225J 

800 

01507 

8-1503 

400 

0 1170 

8 1126 

500 

0 0910 

8 0900 

000 

0 0784 

8 0750 

700 

0-0072 

8 0018 

800 

0 0588 

8-0508 

900 

0 0522 

8-0500 

1000 

0-0470 

8-0450 


Tabli 10.1. —Values of ft and fi^for the Binomial with p 0 /, q —0 <9. 


n. 

ft- 

ft- 

100 

00711 

8 0511 

I 200 

0-0850 

8 0250 j 

1000 

0 0071 

8 0051 j 

1 _ _ __ 

_____ 1 



Table 10.5. - l nines of ft for the Binomial with p 0 5, q~0 J. 


ii. 

ft- 

\ 

4 

2 5 

0 

2-0007 

8 

1 2-75 

10 

2-8 

50 

2 96 

100 

2 98 

1000 

2-998 


Mechanical Representation of the Binomial Distribution. 

10.13, There is an interesting mechanical method of constructing a 
representation of the binomial scries. The apparatus, which is illustrated 
in fig. 10.2, consists of a funnel opening into a space say a { inch in depth 
—between a sheet of glass and a back-board. This space is broken up by 
successive rows of wedges like 1, 2 3, 4 5 6, etc., which will divide up into 
streams any granular material such as shot or mustard seed which is poured 
through the funnel when the apparatus is held at a slope. At the foot 
these wedges arc replaced by vertical strips, in the spaces between which the 
material can collect. Consider the stream of material that comes from the 
tunnel and meets the wedge 1. This wedge is set so as to throw q parts of 
the stream to the left and p parts to the right (of the observer). The 






176 THEORY OF STATISTICS. 

wedges 2 and 8 are set so as to divide the resultant streams in the same 
proportions. Thus wedge 2 throws q 2 parts of the original material to the 
left and qp to the right, wedge 8 throws pq parts of the original material 
to the left and p 2 to the right. The streams passing these wedges are 
therefore in the ratio of q 2 : 2 qp : j) 2 . The next row of wedges is again set 

so as to divide these streams in the 
same proportions as before, and the 
four streams that result will bear the 
proportions q 3 : 8q 2 p : 8qp 2 : p 3 . The 
final set, at the heads of the vertical 
strips, will give the streams proportions 
q 4 : 4q s p : ( >q 2 p 2 : i-qp 2 : p 4 , and these 
streams will accumulate between the 
strips and give a representation of the 
binomial by a kind of histogram, as 
shown. Of course as many rows of 
wedges may be provided as may be 
desired. 

This kind of apparatus was origin¬ 
ally devised by Sir Francis Galton 
(ref. (170)) in a form that gave roughly 
the symmetrical binomial, a stream of 
shot being allowed to fall through rows 
of nails, and the resultant streams being 
collected in partitioned spaces. The 
apparatus was generalised by Karl 
Pearson, who used rows of wedges 
fixed to movable slides, so that they 
could be adjusted to give any ratio of 
<7 : V ( n ‘ f - (174)). 

Fig. 10.2. The Pearson-Galton 10.14. It must not be forgotten 

Binomial Apparatus. that although we have spoken in 10.12 

ol the skewness and kurtosis of the 
binomial distribution, it is essentially discontinuous. This is a serious 
limitation. 

Consider, for example, the frequency-distribution of the number of male 
births in batches of 10,000 births, the mean number being, say, 5100. The 
distribution will be given by the terms of the series (0-49 + 0*51 ) J0 ' 000 , and 
the standard deviation is, in round numbers, 50 births. The distribution 
will therefore extend to some 150 birt hs or more on either side of the mean 
number, and in order to obtain it we should have to calculate some 300 
terms of a binomial series with an exponent of 10,000 ! This would not 
only be practically impossible without the use of certain methods of 
approximation, but it would give the distribution in quite unnecessary 
detail: as a matter of practice, we should not have compiled a frequency- 
distribution by single male births, but should certainly have grouped our 
observations, taking probably 10 births as the class-interval. We want, 
therefore, to replace the binomial polygon by some continuous curve, 
having approximately the same ordinates, the curve being such that the 
area between any two ordinates y x and y 2 will give tlic frequency of 
observations between the corresponding values of the variable x x and 





THE BINOMIAL DISTRIBUTION* » 


177 


Limiting Form of the Binomial for Large n. 

10.15, When n becomes large, each term of the binomial becomes 
small. We are, however, concerned with the sum of the terms falling 
within certain ranges, and these will not be small in general. 

Let us consider first of all the case when p and q are equal. The terms 
of the series are : 



1 + n + 


n(n - 1 ) 

1.2 


n{n -l)(n- 2 ) 
1.2.3 



The frequency of m successes is 

N(l) n 


n\ 

m ! (n - w )! 


rmd the frequency of M4-1 successes is derived from this by multiplying 
it by (n + 1 ). The latter frequency is therefore greater than the 

former so long as 

n -m > m +1 


or 


m < 


n -1 


Suppose, for simplicity, that n is even, say equal to 2k ; then the frequency 
of k successes is the greatest, and its value is 


Vo 




mi 

Id. Id 


( 10 . 6 ) 


The polygon tails off symmetrically on either side of this greatest ordinate. 
Consider the frequency of k + x successes ; the value is 


and therefore 


y x ~~N(D* 


m\ 

(k + .r)! (k -^r)! 


ij x Jk)(k z l)(k -2) . . . (/r -5 + 1 ) 
2/o (& "t 1 ){k + 2)(/c + 8) . • . (/c fr) 


(10.7) 



( 10 . 8 ) 


Now let us approximate by assuming that k is very large, and indeed 
large compared with x, so that (x/k) 2 may be neglected compared with 
(x/k). This assumption does not involve any difficulty, for we need not 
considevalues of x much greater than three times the standard deviation 
or sVk/2 , and the ratio of this to k is 3/V2A*, which is necessarily small 
if k be large. On this assumption we may apply the logarithmic series 


i m^sv x 82 83 84 

log. ( 1 + 8 )- 8 - 2 + 3 -J + 


12: 



Its ' THEORY OF STATISTICS. 

to every bracket in the fraction (10.8), and neglect all terms beyond the 
first. To this degree of approximation, 

i/o ~ a / 1 + 2 + 8 + * * * +** ~!) 

X(X ~ 1 ) X 

k k 
x 2 
k 

Therefore, finally 

X » 

yx^Vo L 1 - ih e • • • • ( 10 * 9 ) 


where, in the last expression, the constant k lias been jeplaeed by the 
standard deviation a, for a 2 —A/2. 

10.16. The ease when p is not equal to q may be treated in a some¬ 
what similar way but is slightly more complicated. 

As before, the frequency of m successes is 


N y n C m q n m p m 
^ n\ 


^ml (n ~m)i 


q H rn p m 


The frequency of (rn 4* i) successes is derived by multiplying this 

expression by ^ ^ and hence is greater than the former if 

n - m p 
- ‘ > 1 
m +1 q 

or 

m < np -q 

Let us assume that np is a whole number. Since n is going to tend 
to infinity, this really imposes no limitation on our work. 

The maximum frequency is, then, 


Pq =" N . * v ,q" q p n 

J0 (np)\ (nq)V 1 

The frequency ot pn + x successes is 


( 10 . 10 ) 


Hence, 


y x ^N~ - Xf , . qftQ—Xpnp-i x 

J (np+x)\(?iq ~x)r 1 

y*= _ np\nq\ 
y 0 (np+oc)\ (nq-v)'. 1 J 


( 10 . 11 ) 


( 10 . 12 ) 


Now, by an important theorem due to James Stirling (1730), if n be large, 
we have approximately 


n ! = V 'lnvn n e~ 



THE BINOMIAL DISTRIBUTION. 


179 


Applying this formula here : 

y x V 2wp 77 (up) n v e V 2n q7r(nq ) ni *e~ nQ p x 

Vo V2(np +x)tt( np + x) n P+*e~ np ~ x V*2(nq - x)7r(nq 
which reduces to 

Vx _* 

*'7i + *r"V 

\ np/ \ 1 


x \m~ x{ 1 
nq / 


Hence, 


loge (£) ■ ~ {nV +X + ^ loge l 1 + np) 


(nq-x + i) log, (l - *) 


. - \( X X 2 if 3 

„ J ,- steV + 8nV + 

\nq X + ~)\ nq 2n 2 q 2 3 n 3 q 3 

Alter a little reari angement this becomes: 

-<)- 


•) 


X*(p 2 + ? 2 ) t p-q„ , q 2 -p i „3 


-}- — 4 4 7 -f- ■* J x 1 ■* M X 

2npq 4 n 2 p 2 q z 2npq 6n z p 2 q 2 


4- terms of order and higher 


Since <7 i p- 1 , wc have, neglecting the terms ol order and higher, 


which arc small compared with the others when n is laige; 


log, ( 2 s) 

w 


a : 2 + x*(p*+q*) + q-p( _ x h 


2 npg 4n 2 p 2 q 2 2 np</ 


3npq/ 


(10.13) 


Put, as bclore, npq— a 2 , wheie a is the standard deviation of the 
binomial. 11 n be laige, the second term is small compared with the first. 

Further, since we need not consider values of ~ much greater than 3, 


if be small, we can 

V npq 

neglect the whole of the third 

these assumptions we have: 

log *•--*! 

ge y 0 2 a* 

or 

X * 

as before. 



(10.14) 



180 


THEORY OF STATISTICS. 


The expression is merely V and so we have in effect simply 

1 Vnpq * 

assumed & small; however much p and q differ we can always make 
Vj8 x as small as we please by increasing n sufficiently. 

10.17. Hence, whether or not p is equal to q, the binomial distribu¬ 
tion tends to the form of the continuous curve ((10.9) and (10.14)) when 
n becomes large, at least for the material part of the range. As a matter 
of fact, the correspondence between the binomial and the curve is sur¬ 
prisingly close even for comparatively low values of n, provided that 
p and q are fairly near equality. The student may care to draw the curve 
with the aid of the tables given at the end of this book (see below, 10.26) 
and compare it with some of the simpler binomials drawn to the same 
scale. 

10.18. The curve 

is called the normal curve. A universe classified according to a con¬ 
tinuous variate whose ideal frequency-distribution is a normal curve is 
called a normal universe. 

The applications of the normal curve are by no means limited to 
distributions of the binomial type. Before we refer to its many practical 
and theoretical applications, however, we shall give a short account of 
its main properties. 

Properties of the Normal Curve. 

10.19. The normal curve is obviously symmetrical about the point 
x=0, for its equation is independent of the sign of x. At this point the 
ordinate has its maximum value. The mean, tlie median and the mode 
coincide, and the curve is. in fact, that drawn in fig. 0.5, page 93, and taken 
as the ideal form of the symmetrical curve. 

10.20. The curve is specified completely by defining the mean 
(the origin of <r), the standard deviation cr and the value ?/ 0 . 

In actual practice, as, for example, when we are trying to fit a normal 
curve to given data, we arc not given y 0 itself, but have to calculate it 
from the fact that the area of the curve must be equal, on the chosen 
scale, to the total number of observations. For this reason we wish to 
find the area under the curve 

.r 3 

v-y^f 2(T% 

10.21. From 6.14 it will be seen that the area of a histogram, that is 
to say, the total number of observations which it represents, is given by 

Area - S (f r ) x h 

r- 1 

where h is the width of the interval, f T is the frequency in the rth interval 
and there are n intervals. 

As the histogram tends towards the continuous curve the width of the 
intervals becomes smaller and the number of terms in the summation 
becomes larger. For the normal curve, which extends to infini ty on 
either side of the mean, the limit to which the sum tends as the intervals 



THK NORMAL DISTRIBUTION. 181 

become indefinitely small and the number of terms indefinitely large is 
written 

( 00 X 1 

y^'^'dx 


the sign J being a conventional form of the summation sign S and dx 
representing the infinitesimally small value of h. 

This is the notation of the integral calculus, and the quantity I F(x)dx 

J -a 

is said to be the integral of F(x) with respect to x between the limits -a 
and -f b. In this book we shall not use the methods of the integral calculus, 
and accordingly it will be necessary for us to state certain results without 
proof. It will be sufficient if the student bears in mind that the process of 
integration is one of proceeding to the limit in cases of straightforward 
summation with which he is already familiar. 

10.22. The area of the curve 

i a 

V -= Htf 2orl 

is then 

r»oo T* 

J W "°‘ dx 

and tliis is equal to 

y 0 a x \/2-it — 2*506G27?/ O or 

Hence the curve 

1 - 

- e 
aV 27 t 


has unit area, and for this reason the equation of the normal curve is usually 
written in the standard form 


1 

//*=■- - e 

(j V 27 t 


2 (r* 


(10.15) 


From this the form corresponding to a distribution of any given frequency 
is immediately written down. In fact, if the frequency is N, the corre¬ 
sponding normal curve is 


N 

(J \ 277 


( 10 . 10 ) 


Constants of the Normal Curve. 

10.23. The mean of the curve is, as we have seen, located at the origin. 
If we wish to write the curve with reference to some other point as origin, 
we can do so in the form 


I - 

y = — ~ e . . . . ( 10 . 17 ) 

<7 V 277 ' ' 

where m is the excess of the mean over the value chosen as origin. 




182 THEORY OF STATISTICS. 

The standard deviation of the curve is a, and the variance is accordingly 

The higher moments are calculated by the processes of the integral 
calculus. Since the nth moment about the mean is given by 

/*„= S(/a? w ) 


we have, proceeding to the limit, that the nth moment of the normal curve is 

u = — ^ [ $ v 

aVsfj-r 


If n is odd this vanishes, as it must loi any symmetrical curve, 
we have: 



and hence, 




4 . a . 2 

2 2.2 


a 4 " »3a 4 


If n is even 
. (10.18) 


. (10.19) 


10.24. From these results it follows that 


ft =yi=° 
ft- 8 * 72 = 0 



. ( 10 . 20 ) 


i.e. the normal curve has zero kurtosis. This is, in fact, the origin of the 
choice of the appaiently arbitraiy value 3 in the definitions of platy- and 
lepto-kurtosis (9.14). 

We may also state without proof the important result that all semin- 
variants of the normal curve of orders higher than the second vanish 
identically. 

10.25. The mean deviation oi the normal curve is 



0 79788 


a 


This is the origin of the rule given m 8.21, that the mean deviation is 
approximately * of the standard deviation. The result is true of the 
normal curve, and very approximately true of curves which do not differ 
markedly from the normal form. The rules that a range of 6 times the 
standard deviation includes the great majority of the observations ( 8 . 12 ) 
and that the quartile deviation is about j of the standard deviation (8.24) 
were also suggested by the propeities of the normal curve (see below. 
10.28 and 10.29). 

Ordinates of the Normal Curve. 

10.26. The normal curve is so important that tables have been 
prepared to give (1) the ordinate of the curve corresponding to any given 

value of x, i.e. the values of r - *. and (2) the areas of the curve to the 



THE NORMAL DISTRIBUTION, 


188 


If 00 ?! 

right and the left of any given ordinate, i,e. the values of J e~Hx 
I f* J* 

and I e 2 dx. Table I of the Appendix gives the values of the ordinate 
V 2 71J _ « 

for values of x proceeding by steps of one-tenth of the standard deviation. 
The values are, of course, the same for positive as for negative values 
of x. More extended tables will be found in u Tables jor Statisticians and 
Biometricians , Part IP 

The ordinate of any normal curve corresponding to a specified value of 
the variate is easily obtained from the table, as may be seen from the 
following example : 

Example 10.3. -To find the ordinate of the normal curve given by 

10,000 

if— e 32 


corresponding to the variate value x 7. 

Here 

N = 10,000, <7 = 4 

Altering the value of o is equivalent to altering the scale of x . The 
ordinate in this curve corresponding to x — 7 will be the same as the ordinate 
of the curve of unit s.d. corresponding to x - { - 1 75. 

From Appendix Table 1, when 

x = 1*8 y -0*07895 
<r = l-7 y - 0-09f05 

lienee, by simple interpolation, when ^ 

a* = 1*75 y = 0*08050 

The ordinate is 10,000/4 times this. i.e. is equal to 210. This is 
a ecu late to the neniest unit. 


Area of the Normal Curve—the Probability Integral. 

10,27. A table of the areas of the normal curve cut off by ordinates 
at specified \ alues of x is given m Table 2 of the Appendix. As m the 
case of the table of ordinates, this table is applicable to all normal curves, 
whatever the value of their standard deviation, the areas cut off on 

l ** i f 1 

y * 7 e 2 by ordinates at x bei ng the same as those cut off on y = 7 —e * 

V27 t a Wit 

x 

by ordinates at More extended tables will again be found in “ Tables for 

Statisticians and Biometricians , Part IP 

The area of the normal curve to the left of the ordinate at x or, it may 
be, between the ordinates at 0 and x —conventions differ—is sometimes 
termed the probability integral or the error function. These names 



184 THEORY OF STATISTICS, 

arise from the use of the function in the theory of sampling and the theory 
of errors respectively. 

Ececnnple 10,4. —Find the frequency represented by the smaller area of 

the curve y = —cut off by the ordinate at or = 7. 

' 4V2t t 

Here 

or „ 

(7=4, 1' / '> 

(T 

For lT -1-7 the greater fraction of area - 0 05543 

(T 

For — — 1 -8 „ „ „ 0MW7 

(T 

Hence, by simple interpolahon, for 

t? -=1*75 the greater fraction of area - 0 05075 
a 

Hence the smaller fraction - 1 - 0 05075 

- 0 01-025 

and multiplying this by 10,000, we have the frequency represented, i.e. 
402-5. 

More exactly, by second differences or more extended tables, the value 
is 400*45. 

v Example 10.5. —A hundred coins are thrown a number of times. How 
often approximately in 10.000 throws may (1) exactly 05 heads, (2) 05 
heads or more, be expected Y 

The number of heads is given by the terms m 

10,000(1 f t) 100 

_ __ jy 

The standard deviation is \ / 0-5 ^ 0*5 x 100 = 5, —2000, and the 

a 

exponent is large enough for us to be able to take the distribution as 
normal. 

The mean number of heads is 50, and 05 - 50 =3cr. The frequency of a 
deviation of 3cr is given at once by Appendix Table 1 as 2000 x 0*00443 
= 8-86, or nearly 9 throws in 10,000. A throw of 05 heads will therefore 
be expected about 9 times. 

The lrequeney of throws of 05 heads or more is given by Appendix 
Table 2, but a little caution must now be used, owing to the discontin¬ 
uity of the distribution. A throw of 05 heads is equivalent to a range 
of 64-5 65*5 on the continuous scale of the normal curve, the division 
between 64 and 05 coming at 04*5. 04*5-50= + 2*9cr, and a deviation 

of +2*9(7 or more will only occur, as given by the table, 187 times in 
100,000 throws, or, say, 19 times in 10,000. 

10.28* From the table of areas we can find approximately the position 

of the quartiles. In fact, we require Ihc \alue of X which will give us 0-75 

cr 



THE NORMAL DISTRIBUTION. 185 


as the greater fraction of the area. From the table we sec that this value 
must lie between 0*6 and 0*7. Simple interpolation gives 


f 2425) 

{ <, C + 01 3229}^ 0C75 


and more exact interpolation gives 


Quartile deviation ^0*67448975cr . . (10.21) 


This is the origin of the rough rule that the semi-interquartile range is 
usually about § of the standard deviation. 

10.29. We also observe from the table that an ordinate 8a from the 
mean cuts off an area 0*99805 of the whole. The smaller fraction left is 
therefore 0-00135 of the whole. Since the curve is symmetrical, it follows 
that a range of 3a on each side of the mean will cut off all but twice this, 
i.e. all but 0*00270 of the whole. This again is the origin of the rule that 
such a range includes the great ma jority of the observations. 


The Normal Distribution as an Error Distribution. 

10.30. We have deduced the normal distribution as a limiting form 
of the binomial distribution when n, the exponent, is large. This, however, 
is only one of the ways in which the normal curve occurs in statistical 
literature, and Gauss was led to it by a totally different line of reasoning, 
viz. by inquiring what law of distribution errors of observation should 
obey in order to make the arithmetic mean of a set of measurements the 
most likely value of the “ true v magnitude. 

10.31. Suppose we take u universe of measurements of some magni¬ 
tude, and consider the universe of deviations from the true value. Let us 
further suppose that any deviation is the result of the operation of an 
indefinitely large number of small causes, each producing a small perturba¬ 
tion. Let us assume that the small perturbations are all equal, and that 
positive and negative perturbations are equally likely. 

Then it may be shown that the distribution of errors or about the true 
value (taken as zero) is given by the law 

1 

y= / < 

aV27r 

For, if 8 is the amount of the perturbation, and positive and negative 
perturbations are equally likely, the expected frequency of rn positive 
errors and n-m negative errors in N observations is the term (i) w (|) w ~ w 
in N(l + l ) n , and the act ual error is mS ~ (n - w )8 = (2 m - n)S. Similarly, 
the frequency of the actual error (2(m + l) -n}8 is given by the term in 
(^yn-u(|)«~7rt-i . aru j so on> Proceeding to the limit, as n becomes large, 
we get the stated result precisely as for the limiting process of 10.15. 

10.32. In the theory of errors it is more customary to write 



( 10 . 22 ) 



186 THEORY OF STATISTICS, 

h is called the 44 precision” (cf. 8.16). As k increases, the normal curve 
becomes narrower and hence h measures in a sense the closeness of the 
bulk of observations to the true value. 

The Occurrence of Normal Distributions in Nature. 

10.33. It was found at an early date that error distributions followed 
the normal law more or less closely, though it must be admitted not with 
any great exactitude. The fact that many universes, particularly bio¬ 
metrical universes such as those classified according to height and weight, 
lie distributed round the mean in a humped curve which is not unlike the 
normal curve, gave rise in the first half of the nineteenth century to keen 
interest. Although the term 44 normal ” had not then been applied, there 
appears to have been a feeling that the curve was the ideal to which most 
distributions should in some degree attain, and that an explanation was 
demanded if they did not. The normal curve was, in fact, to the early 
statisticians what the circle was to the Ptolemaic astronomers. 

10.34. Workers during the latter half of the nineteenth century were 
more careful not to let their theories outrun their facts, and as the data 
accumulated it became evident that the normal distribution was no more 
usual than any other type. In fact, rather the reverse, so that the occur¬ 
rence of a normal distribution was to be regarded as something abnormal. 
“The reader may well ask,” says Karl Pearson (ref. (502)), “is it not 
possible to find material which obeys within probable limits the normal 
law ? I reply, yes, but this law is not a universal law of nature. We must 
hunt for cases.” 

The belief in the validity of the normal law in the theory of errors died 
harder. “ As M. Lippmann once said to me,” says Poincare, in his “ Calcul 
des Probability “ Everybody believes in the law of errors, the experi¬ 
menters because they think it is a mathematical theorem, the mathe¬ 
maticians because they think it is an experimental fact.” 

10.35. One must, however, be careful not to go too far in seeking to 
avoid an over-emphasis on the practical occurrence of the normal curve. 
A certain number of distributions, more particularly those relating to 
measurements on plants and animals, are approximately of the normal 
form. As an example, we may take the distribution of Table 6.7, which 
we show in fig. 10.8 fitted with a normal curve. 

Place of the Normal Curve in Theory. 

10.36. Strangely enough, the realisation that the normal distribution 
did not correspond to any widespread natural effect did not diminish its 
importance in statistical theory. On the contrary, the normal distribution 
has increased in importance in recent years. It is instructive to consider 
why this is so. 

In the first place, the normal curve and the normal integral have 
numerous mathematical properties which make them attractive and com¬ 
paratively easy to manipulate. W r e have, for instance, already seen that 
the moments and seminvariants of the normal curve are expressible in 
simple forms. 

Now the normal form is reasonably close to many distributions of the 
humped type. If, therefore, we arc ignorant of the exact nature of a 
humped distribution, or know the form but fmd it mathematically intract- 



THE NORMAL DISTRIBUTION, * 187 

able, we may assume as a first approximation that the distribution is normal 
and see where this assumption leads us. It is not infrequently found that 
a universe represented in this way is sufficiently accurately specified for the 
purposes of the inquiry. 

10.37. Secondly, we shall find, when we come to consider sampling 
distributions, that many of the universes which occur are of the normal 
form, either exactly or to a satisfactory degree of approximation, 

10.38. Thirdly, the theory of the normal curve has been applied to 
the graduation of curves which are not normal. The Scandinavian school, 



Fig. 10.3. —The Distribution of Stature for Adult Males in the British Isles (fig. 6 . 6 , p. 95), 
fitted with a Normal Curve. To avoid confusing the figure, the frequency-polygon 
has not been drawn in, the tops of the ordinates being shown by small circles. 

whose interests are mainly actuarial, have developed a technique for 
expressing a given distribution in the form of an infinite series whose terms 

a 9 

depend on the quantity e 3 and certain dependent functions. 

10.39. Fourthly, distributions which are not normal can sometimes 
be brought to a form approximating to the normal by a transformation of 
the variate. A universe which is skew with respect to a variate x , for 
instance, might be normal when we take Vx as the variate. We gave an 
example of this kind of effect in Exercise 6.6, page 110, where we saw that a 
universe of men classified according to their weight was skew, whereas a 
universe classified according to height (which we may take to be roughly 
proportional to the cube root of the weight) is nearly normal. 

The Poisson Distribution. 

10.40. We have found that the limit to the binomial would be a 
normal curve even if p and q were unequal, provided that n were increased 
sufficiently to make (q-p) small compared with V npq . We now propose 




183 


THEORY OF STATISTICS. 


to find the limit to the same series if one of the chances, say q 9 becomes in¬ 
definitely small and n is increased sufficiently to keep nq finite, but not 
necessarily large—practical values are in fact usually small. 

Let us suppose that q is very small and that qn is equal to the finite 
number rn. 

In the binomial (q +p) n , the term 


l?P n ~ r 


r \ (n -r)r F 

n ! ( m \ r (i w 

r\ (n - r)!/ V n 


r\ (n -r )! 
m r f m 
r\\ ri' 


(n-r)l n r (l 


. (10.28) 


Now the limit of ^1 - - j as n becomes large = r~ m . 

Applying Stirling’s approximation (10.16) when n is large, the term 

n ! 

- 7 —^- • • ■ ( 10 . 24 ) 


(n-r)\n T y\ j 

_ V27 me~ v n n 

V277(n - r )e~ n + T (n - r) n ~ r n r [ 1 - — \ 


(-JV 

V-i)* 


/ T\ n 

Now the limit of (1 - = e~ r 9 as we need not consider terms in which 


r exceeds quantities of the order V nq 9 and the limits of -) < (l - m J 

are both unity. Hence the limit of (10.24) is unity, and the limit of 
(10.28) is 


10.41. Hence the successive terms in the binomial are 


e~ m , e~ m m, c~ 


jm A s 
6 Q~l 9 etc * 


and the limit of (q +p) n is 


J. - m 2 m 3 \ 

e w l 1 + m h— h— 4- 1 

V 21 3! • ’ 7 # 


(10.25) 



THE POISSON DISTRIBUTION. 


189 


This expression is called Poisson’s distribution, or Poisson’s ex¬ 
ponential limit. It was first published by Poisson in 1837, but has sub¬ 
sequently been rediscovered by numerous writers. 

Constants of the Poisson Distribution. 

10.42. Taking an origin located at the first term of the distribution, 
we have: 


/**>2 \ / 771 ^ \ 

S 0 +»n 4 x ‘- i j + ( 3l x3 J + • • 


/ rn m* 

me v 1 n + 21 + • • 


e ~ m m x 1 *) *(", x3 ) f • • • 

me m (l 4™(1 I 1)4 "'j(‘24l)4 . . .) 
n ,t , m w 2 ?a 2 

V ^ l! + 2! ^ • • • + " M u + ■ 

7 ne~ m {( vl + mc m ) 

— m(m f 1) 

It may also be shown that 

/x 3 ' =■ m (m 2 3m i 1) = r/d (7n h 1 ) 2 + m) 
p 4 ' - m(m 3 + 6ra 2 + 7w+l) 

From these results we have immediately : 

Mean — ni . 

fx 2 =m(m +1) - m 2 


(10.26) 


Hence, 


o* ~ 771 — mean 


(10.27) 


10.43. The third and fourth moments about the mean will be found 
to be 

.... (10.28) 
/i 4 = 8w 2 +m .... (10.29) 

so that 

y m ■ • • • 


. (10.80) 


H t 2 m 2 m 


. (10.81) 



190 


THEORY OF STATISTICS. 


These results should be compared with the expressions 


ft 
ft 5=8 + 


( P ~</) 2 

npq 

1 - 6/x/ 
pqn 


for the binomial. They are, as might be expected, the limits of those 
expressions when q ~ and w is large. 

10.44. We may state without proof that all the seminvariants of the 
Poisson distribution are equal to m. 

10.45. Tables of the limit e~ 1H , for various values of m and r have 

r! 

been published by several authorities. One sueli set will be found in 
“ Tables for Statisticians and Biometric ia ns , Part 



Fig. 10.4.—Frequency-polygons of the Poisson Series for Various Values of m. 


The form of the frequency-polygon of the distribution (which, like the 
binomial and unlike the normal, is discontinuous) ran be judged from 
%. 10.4, in which the polygons for various values of m are drawn. It will 
be seen that for low values of m the polygon is very skew, but that for 
larger values it tends towards a symmetrical form. 



THE POISSON DISTRIBUTION* 


191 


10*46. The condition that p or q shall be small, np or nq remaining 
finite, implies that in practice we should expect to find a Poisson distribu¬ 
tion in cases where the chance of any individual being a u success ” was 
small. Such a case might arise, for example, in considering the deaths 
from a rare disease in a population, the chance of any individual dying 
from it being small, 

10.47. Attention to the fact that comparatively rare events are not 
haphazard was first directed by Quetelct and von Bortkiewicz. The 
latter 5 ^ data of the number of men killed by the kick of a horse in certain 
Prussian army corps in twenty years (1875-94) have become classical. 

N/The frequency-distribution of the number of deaths in 10 corps per 
army corps per annum over twenty years was : 


Deaths, 

0 

1 

2 

3 

4 


Frequency. 

109 

05 

22 

3 

1 


Here the total numbe r of deal hs was 122, and hence the mean deaths per 
army corps per annum is 0-01. Taking this as //*, we find the following 
values for various numbers of deaths per annum : — 


Deaths. 


Frequency assigned by 
Poisson’s Limit. 


0 

1 

2 

3 

\ 


108-7 

66-3 

20-2 

4-1 

0-7 (1 and over) 


If we calculate a 2 for tin* actual distribution, wc find : 


a —0-78, a 2 —0-6079 

Hence, a 2 is nearly equal to the mean, which is m accordance with theory. 
The agreement is, m fact, very much closer than is usual. Many dis¬ 
tributions are now available for the frequency of individuals who have met 
with 0, 1, 2, . . . accidents, e.g. in factories, during a given period of time, 
and more often than not such distributions give a value of the variance 
exceeding the mean. This state of affairs can be accounted for on the 
assumption that the individuals at risk have varying degrees of “ aceident- 
proneness,” and the assumption has been corroborated by finding that 
those individuals who have the largest number of accidents in one period 
are, on the whole, those who have most accidents during a succeeding 
period. 

Another example of the Poisson distribution is given in Exercise 10.17 
at the end of this chapter. The early instances of the distribution were 
nearly all demographic, and for some time it remained more of a curiosity 
than a useful tool. In 1907, however, “ Student ” drew attention to a 
class of hemacytometer counts to which the distribution seemed appropri¬ 
ate, and since that time it has found several important biological applica¬ 
tions. It also appears in problems of controlling road and telephone traffic. 



THEOBY OF STATISTICS. 


i m 

Pearson Curves. 

10.48. The process ol’ obtaining the normal curve as a limit of the 
binomial suggested to Karl Pearson an investigation into a series of 
analogous curves which may be regarded as limits to skew binomials or to 
distributions from a finite universe, e.g. by drawing r balls at a time from 
a bag which contains a finite number N of black and white balls in given 
proportions. One such curve was of the form 

l + *) e ■>* 


This set of curves, divided into Iwelve types, which were Inter regarded 
from rather a different standpoint, can be made to lit a laige number of the 
distributions occurring in practice. 

In the curve given above, y, a and the origin can all be obtained from 
the first three moments. For the other curves of Pearson’s system, 
except some degenerate types, the lirst four moments are necessary to 
vSpecify the constants of the curve completely. The distributions con¬ 
sidered hitherto have required in addition to the area (number of observa- 
tions), either the mean only (Poisson) or the mean and standard deMatiou 
(normal curve) to determine their constants ; but the principle of fitting 
for the more general eur\es remains the same. 'Flu actual moments of 
the curves are equated to the moments expressed in terms of the constants, 
such as y and a, which are to be found. Foi full details of these curves, 
the method of determining the type to choose and the method oi lifting, 
the student is referred to Elderton’s book (ref. (160)). 


SUMMARY. 

1. If the (‘hanee of the success of an (vent is //, and of its laihirc </, then, 
provided that the chance remains constant throughout the trials, the 
expected frequencies of 0, 1, 2, . . . successes in N sets of n trials are the 
1st, 2nd, etc. terms in the binomial 


N(q + p) n 

2. The mean of the binomial is pn and its standard deviation is V npq, 
8. For the binomial: 


ft- 1 ft-st 1 ' 6 '" 

npq pqn 


4. If neither p nor q is small, the binomial tends for large values of n 
to the form 

i* 

?/=-//o'*’ 2<r8 


5. This emve, which may also be written 


N 

y = —y— e 

aV 27 t 


j-* 

2o a 


is called the normal curve. 



THE POISSON DISTRIBUTION. 193 

6. The standard deviation of the normal curve is a. Its third moment 
is zero, and the fourth moment is 3 a 4 . Hence, 

ft-O, ft = 3 

All seminvariants higher than the second are zero. 

7. In the theory of errors the normal universe is usually written: 


h ~ being called the precision. 

8. The mean deviation of the normal curve is 

= 0-79788 ... a 


and the quartile deviation (semi-interquartile range) is 0*67418975 . . . a. 

9. A range 3a on each side of the mean of the normal curve contains 
()• 9973 of the distribution. 

10. If p or q is small and one of pn , qn is linite and equal to m, the 
binomial distribution tends to the limit 


e~ m 1 +m + 


nr 



This is called the Foisson distribution. 

11. The mean of the Poisson distribution is m , and a 2 also equals m. 

12. For the Poisson distribution : 



ft = 3 f 


1 

m 


and all the seminvariants are equal to m. 


EXERCISES. 

10.1. A perfect cubic die is thrown a large number of times in sets of 8. 
The occurrence of a 5 or a 6 is called a success. In what proportion of the sets 
would you expect 3 successes? 

10.2. The following data, due to W. F. R. Weldon, show the results of 
throwing 12 dice 4096 times, a throw of 4, 5 or 6 being called a success :— 


Successes. 

Frequency. 

Successes. 

Frequency. 

0 

— 

7 

847 

1 

7 

8 

536 

2 

60 

9 

257 

3 

198 

10 

71 

4 

430 

11 

11 

5 

731 

12 

— 

6 

948 

Total 

4096 


18 



194 THEORY OF STATISTICS. 

Find the expected frequencies, and compare the actual mean and standard 
deviation with those of the expected distribution. 

10.3. In the previous example find the equation of the normal curve which 
has the same mean, standard deviation and total frequency as the observed 
distribution. 

Find the frequencies to be expected if the distribution were represented 
exactly by the ordinates of this curve and compare them with the actual 
frequencies. 

10.4. Assuming that half the population are consumers of chocolate, so that 
the chance of an individual being a consumer is J, and assuming that 100 
investigators each take ten individuals to sec whether they are consumers, how 
many investigators would you expect to report that three people or less were 
consumers ? 

10.5. An irregular six-faced die is thrown, and the expectation that in 10 
throws it will give five even numbers is twice the expectation that it will give 
four even numbers. How many limes in 10,000 sits of 10 throws would you 
expect it to give no even numbers? 

"io.fi. If two normal universes have the same total frequency but the a of 
one is k times that of the other, show 7 that the maximum Ircqueney of the first 

is * that of the other. 

v^ 10.7. Find graphically or otherwise the point ol inflection of the normal 
curve, and show 7 that it occurs at a distance a fiom the mean ordinate. 

10.8. Show that if np be a w hole number, the mean of the binomial coincides 
with the greatest term. 

10.9. Show 7 that if two symmetrical binomial distiibutions of degree n (and 
of the same number of observations) are so superposed that the /th term of 
the one coincides with the (r -4 l)th term of the other, the distribution formed by 
adding superposed terms is a symmetrical binomial of degree (n 4 1). 

[Note - It follows that if two normal distributions of the same area and 
standard deviation are superposed so that the difference between the means is 
small compared with the standard deviation, the compound curve is very 
nearly normal.] 

10.10. Calculate the ordinates of the binomial 1024 (0 5 + 0*3) lM , and compare 
them with those of the normal curve. 

10.11. If skulls are classified as dolichocephalic when the length-breadth 
index is under 75, mesoccphaltc when the same index lies between 75 and 80, 
and brachycephalic when the index is over 80, lmd approximately (assuming 
that the distribution is normal) the mean and standard deviation of a series 
in which 58 per cent, are stated to be dolichocephalic, 38 per cent, mesoccphalic 
and 1 per cent, brachycephalic. 

10.12. Find the deciles of the normal curve. 

10.13. Write down the normal universe which lias the same mean and 
(uncorrected) standard deviation as that of the lust column of Table fi.7, page 91, 
and find the mean deviation and quartile deviation. Compare the results with 
the corresponding quantities for the actual distribution. 

10.14. Proceed similarly for the skew universe of Tabic (1.8, page 9fi. 

10.15. In Exercise 10.4, if 1000 investigators each choose 100 individuals, 
how many would you expect to report that more than fiO persons are consumers? 

10.16. Taking the universe of screws of Table fi.3, page 84, find the normal 
universe which has the same standard deviation aiuf a mean of 1 inch. 
Compare the frequencies given by this universe witli the actual frequencies. 

10.17. The following data (Lucy Whitaker, ref. (190)) give the number of 

deaths of women over 85 published in The Times during 1910-12 :_ 



THE POISSON DISTRIBUTION. ' 195 

Number of Deaths 
per day. 

0 
1 
*2 

3 

4 

5 

6 
7 

Find the frequencies oi the Poisson distribution which has the same mean as 
this distribution, and compare your results with the actual frequencies. For 
the purpose of this example, simple interpolation in the tables given in “ Tables 
for Statisticians and Biometricians ” is sufficient. 

10.18. In the data of the previous exercise calculate the lirst four semin- 
\ariants. 


Frequency. 

304 

370 

1218 

89 

33 

13 

2 

1 



CHAPTER 11. 


CORRELATION. 

Bivariate Universes. 

11.1. In Chapters 6 to 10 we considered the members of a universe 
classified according to the values of a single variable ; and we saw how 
they could be grouped into a frequency-distribution whose character¬ 
istics could be described by certain constants. We have now to proceed 
to the ease of two variables, in which each member of the universe will 
exhibit two values, one for each of the variables under consideration. 

A universe of this kind is called a bivariate universe. One of our 
main topics will be the way in which the two variables are related in the 
universe. 

11.2. If the corresponding values of the two variables are noted for 
each member, the methods of classification employed in the previous 
chapters may be applied to both variables. We can thus group our data 
into a table of double entry, or contingency table (Chapter 5), showing 
the frequencies of pairs of values lying within given class-intervals. Six 
such tables are given below as illustrations for the following variables : 
Table 11.1, two measurements on a shell : Table 11.2, ages of husbands 
and their wives in marriages taking place in England and Wales in 1933; 
Table 11.3, statures of fathers and their sons ; Table 11.1*, age and yield of 
milk in cows; Table 11.5, the rate of discount and ratio of reserves to 
deposits in American banks ; Table 11.6, the proportion of male to total 
births and the total numbers of births in the registration districts of 
England and Wales. 

Arrays and Correlation Tables. 

11 .3. Each row in such a table gives the frequency-distribution of the 
first variable for the members of the universe in which the second variable 
lies within the limits stated on the left of the row. Similarly for the 
columns. As “ columns ” and “ rows ” arc distinguished only by the 
accidental circumstances of the one set running vertically and the other 

| horizontally, and the difference lias no statistical significance, the word 
l|^j?ray has been suggested as a convenient term to denote either a row or 
/] a column. 

If the values of X iti one array are associated with values of Y in an 
interval centred at Y w then Y n is called the type of the array. 

11.4. A grouped frequency-distribution of the type of Tables 11.1 to 
11.6 may then be termed a bivariate frequency-distribution ; but if we are 
particularly interested in the relationship between the two variates it is 
sometimes called a correlation table. The difference between a correla¬ 
tion table and a contingency table lies in the fact that the latter term mav 

19a 



CORRELATION. f 197 

be, and usually is, applied to tables classified according to unmeasured 
quantities or imperfectly defined intervals. 



(2) Dorsoventrul diameter, mm. 


11.5. We need add very little to what was said in Chapter 0 about 
the choice and magnitude of class-intervals and the classification of data. 
When the intervals have been fixed, the table is readily compiled from the 
raw material by taking a large sheet of paper ruled with arrays properly 













198 THEORY OF STATISTICS. 

Table 11.2* —Correlation between Ages of (1) Husband and (2) Wife in Marriages in 
England and Wales in 1933, (Figures in hundreds—cert ain marriages in which no 
age specified are omitted. Data from Registrar-General's Statistical Review of 
England and Wales for 1933, Tables, Fart II, Civil.) 


(1) Age ot Husband (Years). 



15- 

20- 

25- 

30- 

35- 

40- 

45- 

50- 

55- 

60- 

65- 

~ 

70- 

75- 

Total. 

15- 

33 

189 

56 

8 

2 

_ 

__ 

_ 

" 


- 

- 

-- 

288 

20- 

18 

082 

585 

106 

19 

5 

2 

1 


— 

— 

— 

— 

1418 

25- 

1 

140 

511 

179 

40 

14 

6 

3 

1 

1 

- 



890 

30- 

— 

11 

75 

101 

42 

20 

10 

5 

2 

1 

I 

— 


208 

35- 

— 

2 

10 

24 

28 

19 

13 

8 

5 

2 

1 

— 

- 

112 

40- 

— 

— 

1 

5 

9 

14 

12 

10 

6 

4 

2 

1 

— 

64 

45- 

— 

— 

- 

1 

3 

5 

9 

9 

7 

4 

3 

1 

— 

42 

60- 

— 

— i 

■—• ! 

— 

‘ * 1 

1 

3 

7 

6 

5 

: 3 

1 

— 

26 

65- 

_ 

— 

— : 

— 


— 

1 

3 

5 

4 

3 

1 

— 

17 

60- 

- 

— 

- 

— 

~ - ! 

— 

— 

l 

l 

4 

3 

2 


11 

05- 

— 

— 

— 

— 

— 

— 

— 

— 

1 

1 

3 

2 

1 

8 

70- 

— 

— 

__ 

- 

- 


- 

_~l_l 


— 

1 

1 

1 

3 

Total 

52 

1024 

i 

1238 

424 

143 

78 

56 

47 

| 34 

26 

20 j 

9 

2 

3153 


__ 

1 

_ 

_ 

_ 


1 

__ 






_ 


headed in the same way as the final table and entering a small mark in 
the compartment corresponding to the variate values exhibited by each 
individual. If facility of (‘becking be of great importance, each pair of 
recorded values may be entered on a separate card and these dealt into 
little packs on a board ruled in squares, or into a divided tray ; each pack 
can then be run through to see that no card lias been mis-sorted. The 
difficulty as to the intermediate observations—values of the variables 
corresponding to divisions between class-intervals- - will be met in the same 
way as before if the value of one variable alone be intermediate, the unit 
of frequency being divided between two adjacent compartments. If both 
values of the pair be intermediates, the observation must be divided 
between four adjacent compartments, and thus quarters as well as halves 
may occur in the table, as, e.g., in Table II.3. In this case the statures of 
fathers and sons were measured to the nearest quarter-inch and sub¬ 
sequently grouped by 1-inch intervals : a pair in which the recorded 
stature of the father is 60-5 in. and that of the son 62*5 in. is accordingly 
entered as 0*25 to each of the four compartments under the columns 
59*5-60*5, 60*5-61*5, and the rows 61*5-62*5, 62*5-63*5. 


Frequency-surface and Stereogram. 

11.6. The distribution of frequency for two variables may be repre¬ 
sented by a surface in three dimensions in the same way as the frequency- 
distribution for a single variable may be represented by a curve in two. 
We may imagine the surface to be obtained by erecting at the centre of 
every compartment of the correlation table a vertical of length proportion¬ 
ate to the frequency in that compartment, and joining up the tops of the 
verticals. If the compartments were made smaller and smaller while the 
class-frequencies remained finite, the irregular figure so obtained would 
approximate more and more closely towards a continuous curved surface 
—a frequency-surface —corresponding to the frequency-curves for single 






CORRELATION, 


199 


variables of Chapter 6. The volume of the frequency-solid over any area 
drawn on its base gives the frequency of pairs of values falling within that 


I 


o 4 & 


§<N* 
Oq O 

’-h ^ 

*■ £ 

c tS 

I 

o'PQ 


K £ 

C3 cj 

S C 

I - 

v* 


3 £ 

^ i 


: .c 


t S 

51 

I 5 

CO 8 
i-3 ^ 






*0 *o *o *o »o 0 irr> 10 *0 *o 



1 

mhm c y ri ci ^ co a aoooer'CHOsGOrti'i^eo 

F2 


H 

r-1 


74 0-75*5. 

1 1 i 1 1 1 1 1 1 r° 1 1 1 1 1 

*o 



>o 


7.} 5-74 5 

II 1 1 1 1 1 1 1 l^rV* 10 run 

HU 



iO 10 > 0*0 10*0 >0 10 



72 5-74 5 

| | | | | | < 0,0 10 •?'?«> 

lO 

00 

CN4 



>0 >o 0 >0 >0 *0 



71 5-72 5. 

| | 1 | | | *o *0 »o <n 10 m n i' 1 c cm 

* • * > 1 * rH OS CO O CO >A) CO CO r-t * 

OS 






70 5-715 

1 1 | Cl | ^ ^4 10 IfJN \0 tT* lO W> | | I 

II* H 1 CM III 

00 



■MHHH 

l> * 



0 <0 *0 in >-5 »o 

cl n >0 t-e-, * 0*0 1-0 >0 



09 > 70 5 

1 1 1 CIlMrjMrjff OH»(M ] I rHri j 

to 




rH 



»n 10 to in ift »o 

>0 iO n I >0 *0 lONfwtnn <M 



08 5-4*9 6 

| I 1 Inefirtif *-i 



1 1 1 1 r-t ''l CM 04 rH rH 1 





rH 



»o *0 IO lO >0 >0 I 

»o <n cm 0 >0 *o >o 1 - >0 <■*« rM cm 1 


T? 

07 5 G8 5. 

Hr-UOiC - VfflSI'MOHrlH 1 




11 rrtNCIHH 








*o *o 0*0 

ui n io m (- *o t- r— 10 

m 

£ 

06 5 07 5 

1 rH (Mrtt>('iOH«lHON l N | | j 1 | | 



1 rt N O H H n 111,1 

r o 

£ 

_ 

— 

rN| - 

7 


J CM *u.|. "*IM 

! 


65 5 66 5 

| | IO -r _ JO^OOrM rH | | | | | 

[ „ 

1 




1 



o »o *o o >o «o i 



64 5-65 5 

1 *° '- r » "f" , ~ ,rt 1 1 1 II 

rH 1 HI * M O A 3 CMO *6 9) rH 1 1 • 1 ' 

*o 



rIHrlHH 

1 ® 



(O *0*0*0 I 



6 5 5 61 5 

*o I 04 0*0 1*. 1 1 1 1 1 M 

uS 


1 c) O C» CO © *0 cm « 1 * ' 1 1 4 1 

rH 



HH 1 

to 



*o *0 »n *0 , .. 

! *o 


G* ) 6 15 | 

*r n o r* , n o 

; 

rt CO O CO l" IQ H 1 ’ rH ' j 

1 os 
| M 



f * * n n n in 1 



61 5 62 5 

1 l i 1 1 1 1 l 1 II M 

Ui 



*o o 

1 


60 > 61 > 

11^-400 -O j | | | | | | | j | | 


1 

1 

^ 1 

1 </> 






59 5 60 5 

| | ^ *' | 1 II 1 1 1 1 1 1 1 M 

I n 






58 5 r >9 5 

| 1 1 1 1— 1 1 1 1 1 | 1 1 II 1 1 1 1 | 

1 " 



iO *n O n O iO i- 1 *0 O *“* 0*0 0*0*0*0*0*0*0 10 
c n 9) VI HI OsOI C - o H O) -O -)( o (O N X a 

1 

o 

H 



vrs rrrr’TV''• 

O O *0 >0 lO *0 *0 *0 lO 1 5 O ‘O *0 *0 0*0 Z 0 lO IO 
OC rlCOV n /■ a»C HOI -o HlOfl'OO 

lOOCCiXKOOCCCO £> O t£> l*- *- t** r- *- l- l- c~ 



(2; Stature ot Son 


area, just as the area of the frequency-curve over an interval of the base 
line gives the frequency of observations within that interval. 

11.7. Similarly, a figure analogous to the frequency-polygon or the 
histogram may be constructed by drawing the frequency-distributions for 




THEOKY OF STATISTICS 


200 

all arrays of the one variable, to the same scale, on sheets of cardboard, 
cutting-out and erecting the cards vertically on a base-board at equal 


00 
fc Cl 

« o* 

> 

•a r 
£*•4! 

fee 

u 

s? 

sv 

2 P 

to 

St's 

•X3 TJ 

s: 'aJ 

> !3 


4-4 

™ a 


a 

Is 1 
H £ 
*>5 
§ o 

•g < 

5 - 

I £ 




Totals. 

pHiOCOCO-HrHCO©CC)CCJiOCOCOQO^^COC^aOlOCO»0^>0(MpHpH 

iHCCf'lOmMOIOXOOO^XHiO'-'iOWHH 

pHCMCO’^iOiOiO'^t'^OIJMpHPH 

4912 

CO 

I' 

pH 

i ii 11 m i 11 r 11111111 ii m i ii 


11 ii ii r~ i m“" 11111 ii 11 m 11 




<n 

CO 

1 ! I i till 


r—1 




HHH _ w 


>o 

1 1 M 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 i 







*o 


III 1 1 






. , , , . ■MH'tflCiiQClM^HHH ,H . , , , . ■ pH , 

<N 

CO 


CO 

pH 




r_( r-H rH 1C l" CO CO © © r- CO t- rh CO , , . . . . . . 

>o 

ci 

pH pH 





- 

PH , , <M CO <N . . . ph 

Cl 


pH H rH pH pH pH j 

ci 





. <m h © x «• >c w cc c a od i' a o ^ w h . 1 

CO 

© 

HHWMMMHH pH 

C 1 


till (lilt) 

CJ 



© 

© 

Ill HCIWMMCICChhh j ||l 

I- 



CJ 



© 

00 

1 1 I I CUMCOrtnoO^COC'lHMH 1 1 | | 

p* 


,rHCOpHiO©cn»t<O500COt:-p©©iO0Or-r-T^tO-<^T^ , . . , , 

© 

l> 

CA CO >0 IO t' GO O 1' tH CO C J pH 1 

CO 



© 

co ! 

, l PHC-©OOOOCO'pf'Pt 4 COCOaoO©nO'^(NI>©PH(M . HIM , , 

Cl 


| | rHCOTfi-©oopHi-cio^cocoC'i ! ! | 



f-H 

00 


<M PH O0 t- © I'- © © PH Ol <N 1- CO CO CO CO »C> CO l> Cl <M <M 

t> 


1 1-4 coo i' h to p* -p c a « « <M h h 1 III 




© 



rH 


<N IO © © © IQ © 00 © Ip I - CO Cl © o © I> <M 

Ci 


I PH<MI’-l>rHHt<~t<-t<PH©COTt<PH<NrH I | I I | I 1 

CM 

,| T 

| pH p-4 r—1 rH pH IIIIIII 

rH 





, , « W Cl O p- H IO CD H © CO CO lO H CJ «.. , , 

C4 

CO 




1 1 Iflllllll 




00 


00©>©pH<MC0Tt<»0©e»00©©p>ic<ic0'pHO©r''C0©©PHCic0'^ 

"es 


h h h M h P- PH h H h Cl Cl Cl Cl Cl Cl Cl M Cl M CO « CO w 

o 


____ _ _ __ _ 

H 


(2) Yjeld of Milk per Wtek (Gallons). (Ccntiul Value of Interval.) 


distances apart, or by marking out a base-board in squares corresponding 
to the compartments of the correlation table, and creeling on each square 
a rod of wood of height proportionate to the frequency. Such solid repre- 


































202 


THEORY OF STATISTICS 


Chapter 6, fig. 6.5, page 93. Like the symmetrical distribution for the 
single variable, this is a very rare form of distribution in economic statistics, 


I 5*$ 

•a 13 

si IS* 

ttcP is 

?«K- 

a 

« C *2 O 

l-Si*. 

£.sS$ 

g o'l^ 

i §> 
iii« 
yi«s 

tell 

11 &> 
< 4 > 'ts r* 
SO g 

|S3 = 

5 cO 

•5 *f«H M 

0 O 

•fc B 

^ « § . 


Total. 

C GO ^ OJ tH r-> ^4 rl r-t 1 1 1 

*H CM 

1 


643-45 

- 1 1 1 1 1 1 1 1 1 1 1 II II 1 i 1 1 1 1 1 1 1 1 1 1 

- 


640-42. 

-111111111111111111111111111 

- 


637-39 

1 1 1 II 1 II 1 ! II 1 1 t II i 1 1 1 1 1 1 1 1 1 1 

' 


634-30. 

— 1111111111111 n 11111111111 

CO 


631-33. 

-11M1111111111111 n 11111111 

rH 


628-30. 

-111111111111111111111111111 



626-27. 








622-24. 

1 1 II 1 1 1 1 1 1 1 1 1 II II 1 II 1 1 1 1 


i 

619-21 

“S’"--- 1 1 1 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 1 

'£> 
f J 

s 

616-18. 

|| | |- | | | | | | | || | | | | | || 


i 

o 

513-16. 

X — 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 

3 


610-12. 

| N M | M H W H | j j | — H | 03 | j j 

<y> 

ci 

rH 

© 

a 

5 

607-09. 

eiaf-00<©lQ>OCC>COT<iCi5‘O®*a . HH . eo I HNH . . . r-4 r-t 

jo 

t 

« 

604-06 

2'«3l2 0 ' QHy ' NW 1 I M W W H M | W H | ( I | | | I j | 


© 

£ 

601-03. 

88*^ 1 l M 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 M 1 1 

I s 


49^-600 

II II II II I II II 1 1 II 1 1 1 1 1 1 1 

1 60 



1 1 I 1 1 1 1 1 ( 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 


o 

h 

496-97. 

1 II 1 1 II 1 1 11 1 i 1 1 I II 1 1 1 1 1 1 1 

1 ^ 

s. 

o 

£ 

492 94. 

««1II11111 1 1 1 1 1 1 1 11 1 1 11 11 1 1 1 

| CO 

s 

489-91. 

««1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 

: 


486-88. 

" 1 1 II M 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 ! 1 1 1 1 

ei 


483-85. 

-- II 1 1 1 II 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

| « 


480-82 

* 1 1 II 1 1 i 1 11 II 1 II II 1 1 1 1 1 1 1 1 1 1 

1 ** 


477-79 | | | | 1 ! I 1 | | | | || || 1 | II 1 ! | | | | || | 

j 1 


474-76. | | | | II 1 | | 1 | II ! 1 1 1 1 1 1 | 1 | II | | | | | 

1 


471-73. 

- II 1 1 1 1 II 1 1 II 1 II 1 II 1 II M II II 

f-i 


468-70 

1 1II11 1 1 111II 1 11 11111111 11 1 1 11 

1 


466-67. 

-1 II 1 1 1 1 1 1 1 1 1 II 11111111 1 1 1 II 1 


*°°S;SSiS8S8333828S8S£823S!S§3S8 
<i ai cj ci ci 4 4 A J <J> A <A> 4 4 A ei ci c> 4 <A d <A si -i <A 

Total 


(2) lotnl 

Xu nil Cl ot Hull 8 m Dibtnct (000 g omitted) dm ug Derade. 



but approximate illustrations may be drawn from anthropometry. Fig. 
11.1 shows the ideal form of the surface, somewhat truncated, and‘fig. 11.3 
the distribution of Table 11.3, which approximates to the same type— 
the difference in steepness is, of course, merely a matter of scale.* The 


















































CORRELATION. 


208 


maximum frequency occurs in the centre of the whole distribution, and 
the surface is symmetrical round the vertical through the maximum, equal 
frequencies occurring at equal distances from the mode on opposite sides. 

Table 11.7.— Showing the Monthly Index-numbers of Prices of (1) Animal Fee ding-stuffs 
and (2) Home-grown Oats in England and Wales for 1931-1935. The index-numberft 
are based on prices in corresponding months of 1911 13. (Data from Agricultural 
Market Report for England and Wales.) 



Index of 

Tndex of 


Index of 

Index of 

Month. 

Feeding-st uffs 

Oats 

Month. 

Feeding-fit ufts 

Oats 


Price. 

Price. 


Price. 

Price, 

1931 Jan. 

78 

84 

1933 July 

85 

75 

Feb. 

77 

82 

Aug. 

83 

79 

Mar. 

85 

82 

Sept. 

80 

78 

Apr. 

88 

85 

Oct. 

78 

78 

May 

87 

89 

Nov. 

80 

76 

June 

82 

90 

Dec. 

83 

75 

July 

81 

88 




Aug. 

77 

92 

1934 Jan. 

82 

80 

Sept. 

70 

83 

Feb. 

83 

91 

Oct. 

83 

89 

Mar. 

85 

87 

Nov. 

97 

98 

Apr. 

83 

84 

Dec. 

93 

99 

May 

82 

81 




June 

85 

83 

, 1932 Jan. 

95 

102 

J uly 

88 

83 

! Feb. 

97 

102 

Aug. 

101 

92 

Mai. 

102 

105 

Sept. 

102 

98 

Apr. 

99 

105 

Oct 

98 

94 

May 

97 

107 

Nov. 

96 

94 

Juno 

91 

107 

De<. 

98 

95 


91 

101 




1 Aug. 

97 

106 

1935 Jan. 

98 

100 

1 Sept. 

92 

96 

Feb. 

92 

99 

i Oct. 

89 

90 

Mar. 

92 

96 

Nov. 

90 

85 

Apr. 

90 

98 

i Doc. 

90 

81 

May 

88 

97 

1 



June 

86 

98 

| 1933 Jan. 

92 

84 

July 

83 

j 99 

Feb. 

91 

85 

Aug. 

80 

92 

1 Mar. 

90 

84 

Sept. 

81 

90 

| Apr. 

86 

81 

Oct. 

86 

: 89 

May 

85 

76 

Nov. 

83 

87 

I Juno 

85 

77 

Doe. 

82 

1 83 

__ 

___ J 

_ _ _ _ 

_ 

1 

1 _ 


The next simplest type of surface corresponds to the second type of 
frequency-curve -the moderately asymmetrical. Most, if not all, of the 
distributions of arrays are asymmetrical, and like the distributions of fig. 6.7; 
the surface is consequently asymmetrical, and the maximum does not lie 
in the centre of the distribution. This form is fairly common, and illustra¬ 
tions might be drawn from a variety of sources—economics, meteorology, 
anthropometry, etc. The data of Table 11.4 will serve as an example. 
The total distributions and the distributions of the majority of the arrays 
are asymmetrical, the rows being markedly so. The maximum frequency 
lies towards the upper end of the table in the compartment under the row 
headed “ 16 ” and column headed 44 4.” The frequency fails off very 
rapidly towards the lower ages, and slowly in the direction of old age. 



THEORY OF STATISTICS. 


204 

Outside these two forms, it seems impossible to delimit empirically any 
simple types. Tables 11.5 and 11.0 are given simply as illustrations of two 



very divergent forms. Fig. 11.2 gives a graphical representation of the 
former by the method corresponding to the histogram of Chapter 6, the 
frequency in each compartment being represented by a square pillar. The 
distribution of frequency is very characteristic, and quite different from 
that of any of the Tables 11.1 to 11.4, 






CORRELATION. 


205 


The Scatter Diagram. 

11.9. There is another method of representing bivariate data graphic¬ 
ally which is particularly useful for ungrouped data. Take, for instance, 
the data of Table 11.7, giving the index-numbers of prices of animal 
feeding-stuffs and home-grown oats for each month of the years 1931-85. 
There arc only 60 pairs of values, and the data cannot be grouped into 
a frequency-distribution with class-intervals of reasonable size without 



Index number of feeding-stuffs prices 


Fig. 11.4.—Scatter Diagram of Index-numbers of Prices of (1) Animal Feeding-stuffs 
and (2) Home-grown Oats (Table 11.7). For the meaning of the straight lines, 
see Example 11.1, page 217. 

giving rise to irregular frequencies. We may, however, proceed as 
follows :— 

On squared paper take two axes at right angles, one axis corresponding 
to the variable X and the other to the variable Y (see fig. 11.4). To each 
member of the universe there will correspond a pair of values X, Y, which 
in turn will correspond to a point whose abscissa on the diagram is X and 
whose ordinate is F. Thus the universe, when represented in this way, 
will give a swarm of points oil the diagram, and we can interpret the ways 
in which these points cluster or scatter as properties of the relationship 



THEORY OF STATISTICS* 


206 

between the two variables. Fig. 11.4 shows the data of Table 11.7 plotted 
in this way. It will be observed that the points tend to distribute them¬ 
selves so that high and low values of X correspond to high and low values 
of Y respectively. 

Sueli a figure is called a scatter diagram. 

11.10* We can also represent a grouped bivariate frequency table on 
a scatter diagram, though less satisfactorily and with some labour. For 
this purpose axes arc taken as before and abscissa 1 and ordinates drawn to 
correspond to the divisions of the frequency table. The diagram will then 
be divided into compartments corresponding to the compartments of the 
table. In each compartment we place a number of dots equal to the 
frequency in the corresponding compartment of the table. We have, as a 
rule, no guide as to the disposition of these dots within their respective 
cells, and hence it is usual to place them in some symmetrical arrangement 
so that they are, as nearly as may be, spread nmtormly through the eells. 

The difficulty of inserting the dots when the frequencies are large will 
be obvious, and, in fact, such a scatter diagram rarely tells us more than we 
can sec from an inspection of the table itself. In contrast to this, the 
scatter diagram of the data of Table 11.7 gives a much better picture of the 
dependence of the two variates than can be obtained by mere inspection of 
the ungroupod data of the table. 

11.11. It is clear that a correlation table may be treated by the 
methods discussed m Chapter 5, which are applicable to all contingency 
tables, however formed. But the coefficient of contingency merely tells 
us whether two \ ariables are related, and if so, how closely. The methods 
we shall now discuss go much further than this. The numerical character 
of the variates and the arrangement ol the correlation table in class- 
intervals of equal widths enable us to approach the problem of investigat¬ 
ing the relationship between the variates with additional precision. 

11.12. If the two Variates in a contingency table are independent, 
the distributions in parallel arrays are similar (5.18) ; hence their averages 
and dispersions, i.e . their means and standard deviations, must be the same. 
In general they will not be the same, and we are thus led to inquire into the 
relation between the values oi the means and standard deviations in 
different arrays and the departure of the distribution from complete 
independence. 

11.13. The mean is the most important constant, in general, and for 
the present we shall concentrate our attention upon it. Although the 
values in arrays are scattered about their respective means, it is in most 
eases profitable to inquire how the means of arrays are related ; this will 
throw a good deal of light on the important question whether high values 
of one variate show any tendency to be associated, on the average, with high 
v alues of the other \ anatc. 

If possible, we also wish to know how great a divergence 4 of one variate 
from its mean is associated with a given divergence of the other, and to 
obtain some idea of how closely the relation is usually fulfilled. 

/ 

Lines of Regression. 

11.14* Let us then consider the means of arrays. Let OX, OY be 
two axes at right angles representing the scales of the two variates. As in 
the ease of the scatter diagram we can plot the positions of the means ; for 



CORRELATION. 


207 


example, if the mean of a row whose variate value is centred at y t is m l9 
we can plot the point whose abscissa is m l and whose ordinate is y v There 
will thus be one point corresponding to each row and one to each column. 
In practice, to distinguish the two, the means of rows are denoted by small 
circles and the means of columns by small crosses. Fig. 31.8 shows such 
a diagram drawn for the data of Tabic 11.3. 

The means of ro^ws and the means of columns will, in general, lie more 
or less closely round smooth curves. For example, in fig. 11.8 they lie, 
very approximately, on straight lines, 11R and CC in the figure. Such 
curves are said to be curves of regression, and their equations with 
reference to the axes OX and OY are called regression equations. If 
the lines of regression are straight, the regression is said to be linear. In 
the contrary ease it is said to be curvilinear. 

11.15. The term “ regression ” is not a particularly happy one from 
the etymological point of \iew, but it is so firmly embedded in statistical 
literature that we make no attempt to replace it by an expression which 
would more suitably express its essential properties. It was introduced by 
Galt on in connection with the inheritance of stature. Galton found that 
the sons of fathers who deviate x inches from the mean height ot all fathers 
themselves deviate from the mean height of all sons by less than x inches, 
/>. there is what Galton called a “regression to mediocrity.” In general, 
the idea ordinarily attached to the word “regression” does not touch 
upon this connotation, and it should be regarded merely as a convenient 
term. 

11.16. If two variates are independent, their regression lines are 
straight and at right angles, the means of rows lying on a line parallel to 
the axis OY and the means of columns on a line parallel to the axis OX, 
for the distributions in parallel arrays arc similar (see lig. 11.5). In any 
ease drawn from actual data, of course, the means might not lie exactly on 
straight lines, owing to fluctuations of sampling. 

11.17. The eases with which the experimentalist, e.g. the chemist or 
physicist, has to deal, where the observations are all crowded closely 
round a single line, lie at the opposite extreme from independence. The 
entries fall into a few compartments only of each arra} 7 , and the means of 
rows and of columns lie approximately on one and the same curve, like the 
line RR of fig. 11.6. 

11.18. The ordinary eases of statistics are intermediate bed ween these 
two extremes, the lines of means being neither perpendicular as in fig. 11.5, 
nor coincident as in fig. 11.6. One problem of the statistician is to find 
expressions which will suffice to describe the regression lines, either exactly 
or to a satisfactory degree of approximation. 

In general this is a difficult problem, and the theory of curvilinear 
regression is as yet incomplete. We can, however, make considerable 
progress by confining ourselves to the eases in which the regression is linear. 
Cases of this kind are more frequent than might be supposed, and in other 
eases the means of arrays lie so irregularly, owing to the paucity of the 
observations, that the real nature of the regression curve is not indicated 
and a straight line will give as good an approximation as a more elaborate 
curve. 

11.19. Consider the simplest case in which the means of rows lie 
exactly on a straight line RR (fig. 11.7). Let be the mean value of F f 



THEORY OF STATISTICS. 


208 

and let RR cut the horizontal through M % , in M. Then it may be 
shown that the vertical through M must cut OX in the mean of X . 
For, let the slope of RR to the vertical, i.e. the tangent or the angle M X MR 



y 


Fig. 11.7. 

or ratio of kl to IM, be b l9 and let deviations from My, Mx be denoted by x 
and y. 

Then for any one row of type y in which the number of observations 
is n, and therefore for the whole table, since S(m/)«=0, 






CORRELATION* ' 200 

S(&)« b x S(ny) «=0. M ± must therefore be the mean of X, and M may 
accordingly be termed the mean of the whole distribution. 

Knowing that HR passes through the mean of the distribution, we can 
determine it completely if we know the value of b v 
For any one row we have 

S(#*/) *=*yS(x) — rib^* 

Therefore for the whole table 

S(^)-^S {y*)n~Nb x o y * 

Let us write 

.... (n.i) 

Then 

.(H- 2 ) 

Similarly, if CC be the line on which lie the means of columns and b 2 is 
the slope to the horizontal, 

^2 ~ 2 * ( 11 . 3 ) 

u x 

Now let us define 

r= _p_ = . . . (11>4) 

Wu VS(,r 2 )S (*/ 2 ) 

Then 

bj—7*— and b 2 ~r°- v . . . (11.5) 

<*x 

and the equations of HR and CC, referred to the centre of the distribution, 
are 


ir — r°-y and . . . (11.6) 

(Jy G x 

and, referred to the origin 0, 

(X-MJ^^Y -il/ 2 ), Y . (11.7) 

Gy G x 

11.20. Let us now proceed to the case when the means of arrays are 
not situated on a straight line. This wo shall treat by finding the next 
best thing—straight lines which are the closest tit to the means. 

The expression “ closest fit,” as applied to the fitting of curves to points, 
is one which we deal with at length in Chapter 17, and it is only necessary 
to say at this stage that the straight line RR of closest fit to the means of 
rows, i.e. 

will be determined by evaluating a y and b x so as to make the expression 

E = S{a; - (% + bjy )} 2 


14 



5510 


THE Oil Y OF STATISTICS* 


(that is, the sum of the squares of the horizontal distances of the points 
representing the observations* from RR) a minimum. Here x and y, 
as before, denote deviations from the respective means of X and F, and 
the summation is taken over all values of x and y. 

We have, expanding E , 

E - S (aS) - 2S {< h (x - %)} l- S(.r - b x yY 
The second term on the right vanishes, since S(«r) ^S(//) -0, and hence 

Now Gj and b } can be chosen independently, and hence E is a minimum 
only if S(c/ X 2 ) =0, i.e . 

a t =-0 .( 11 . 8 ) 


Thus the line of closest lit goes through the mean of the distribution. 
Hence, 

E « S(.t -btff) 2 

4 VS(i/ 2 ) 


-^-I&ra-feT 


(11.9) 


This is a minimum when the first term (a square) is zero, i.t. when 


h- 


S(.Ti)) 

S(2/ 2 ) 


. ( 11 . 10 ) 


which is the same as equation (11.2). 

We may show similarly that the line ol closest lit C( \ given by 


has 


y ~a z + b 2 x 


« 2 -0, 




SOr y) 

S(x 2 )* 


which is the same as equation (11.3). 

If we regard the equation 

x -<7, l -b A y 

as one for estimating x from ?/, we may take x-a L - b L y as an error of 
estimation, and E w r ill then be l lie sum of the squares of such errors. The 
condition that E is a minimum is then equivalent to the condition that the 
sum of squares of errors of estimation shall be a minimum. This is one 
form of the so-called 44 Principle of Least Squares ” (sec Chapter 17). 

11.21. Equations (11.6) and (11.7) are thus of general application. 
If the regression is exactly linear they give the lines of regression. If the 
regression departs from linearity, either owing to sampling effects or owing 
to real divergences, they give the <k best ” straight regression lines which 
the data admit. We may regard the equations as either (a) equations for 
estimating an individual x from its associated y (or y from its associated x) 
in such a way that the sum of squares of errors of estimation is a minimum ; 



CORRELATION * 


211 




or (b) equations for estimating the mean of the x's associated with a 
particular y (or the mean of t/’s associated with a particular x) in such a 
way that the sum of the squares of errors of estimation is a minimum, 
each mean being counted proportionately to the number of observations 
on which it is based. 


/ttoefficient of Correlation. 

11.22. The coefficient r defined in equation (11.4) is of very great 
importance. It is called the coefficient of correlation. 
r cannot exceed +1 or be less than ~1. 

For, from equation (11.9) we see that the value of E is 

S(»-%) 8 -S(a’ 2 )- { |^|| r -S(<r 2 ){l-r 2 } . (11.11) 


But E is the sum of a number of squares and cannot be negative. 
Hence, 


1 -r 2 ^ 0 


which proves the result. 

If r— +1, the regression equations arc identical, as may be seen from 
equations (11.0), and hence the lines 11R and CC coincide. In this case it 
follows from (11.11) that for all pairs of values of the variates 


x - b x y ~ 0 


i.e . all values he on a single straight line. Thus to one value of x there 

FVLthw Jr j taint'e 



Fig. 11.8. —Correlation between Stature of Father and Stature of Son (Table 11.8): 
means of rows shown by circles and means of columns by crosses: r = +0 51. 




212 THEORY OF STATISTICS. 

corresponds one, and only one, value of y . This is the ease we mentioned 
in 11 . 17 , and since high values of x correspond to high values of y, the 
variables may be said to be perfectly positively correlated. 

Similarly, if r = -1, the pairs of values all he on a single straight line as 
before, but high values of one will be associated with low values of the 



Fio. 11.9. Correlation between Aye and Weekly Yield of Milk from Cows (Table 11.4): 

means of rows shown by circles and means of columns by crosses: r = -\ 0*22. 

other. In this case we can say that the variates are perfectly negatively 
correlated. 

Finally, if the variates are independent, r is zero, lor b 3 and b 2 are zero, 
and the lines of regression are parallel to OX and OY. It does not follow, 
however, that if r is zero the variates arc independent; the fact that r is 
zero implies only that the means of arrays lie scattered around two straight 
lines which do not exhibit any dclinite trend away from the horizontal or 
the vertical as the case may be. Two variates for which r is zero may, 
however, be spoken of as uncorrelated. Table 11.0 will serve as a case 
where the variates are almost uncorrelated but by no means independent, 
r being very small ( -0*014) (sec fig. 11.10), but the coefficient of con¬ 
tingency C (for the grouping of Exercise 11.3) 0*47. Figs. 11.8 and 






CORRELATION. 


213 


11.9 are drawn from the data of Tables 11,8 and 11.4, for which r has 
the values +0*51 and +0*22 respectively. The student should study 
such tables and diagrams closely, and endeavour to accustom himself 
to estimating the value of r from the general appearance of the table. 


Pi'crportwnof Mah hu'ths per 1000 Oirths 



Fig. 11.10,—Correlation between Ruths in a Registration District and Proportion of Male 
Births per Thousand ol* All Births in England and Wales, 1881 90 (Table 11.0): 
means arrow's shown by circles and means of columns by crosses: r = -0 014. 

Coefficients of Regression. 

11.23. The two quantities 


are called coefficients of regression, b 1 being the regression of x on y , or 
deviation in x corresponding on the average to a unit change in y , and b 2 
being similarly the regression of y on x . 

The coefficient of correlation is always a pure number, but the coefficients 
of regression are only pure numbers if the variates are the same in kind ; 

for they depend on the ratio Gx , and consequently on the units in which 

a u 

x and y arc measured. 

Since r is not greater than unity, one of the coefficients of regression is 
less than unity ; but the other may be greater than unity, if °- x or be 




214 


THTCORY OF STATISTICS, 


11.24. The two standard deviations, 

s x “ - r 2 , s y — cr y v/1 - r 2 

are of considerable importance. It follows from (11.11) that s x is the 
standard deviation of (x ~h x y ), and similarly s y is the standard deviation 
of (y-b 2 x). Hence we may regard s x and s y as the standard errors (root- 
mean-square errors) made in estimating x from y and y from x by the 
respective regression e quations 

x = b x y, y = b 2 x 

s x may also be regarded as a kind of average standard deviation of a row 
about BE, and s v as an average standard deviation of a column about CC. 
In an ideal cast', where the regression is truly linear and the standard 
deviations of all parallel arrays are equal, a ease to which the distribution 
of Table 11.3 is a rough approximation, 1 s x is the standard deviation of the 
ir-array and s y the standard deviation of the y- array. Hence s x and s v are 
sometimes termed the “ standard deviations of arrays.” 

Calculation of the Coefficient of Correlation. 

11.25. We now proceed to the arithmetical work involved in calculat¬ 
ing the correlation coefficient. 

For this purpose we use the formula (11.4), i.e . 

r _ -3^L=- 

Na x a v Vs(j ,2 )S(f/ 2 ) 

The calculation of S( t r 2 ), or cr a , and of S (y 2 ), or ct v , proceeds exactly 
as in Chapter 8. The only expression of a novel type is the quantity 

S {ocy), which we may call the first produet-moment of the distribution. 2 

As in the case of univariate distributions, the form of the arithmetic is 
slightly different according as the observations arc 1 grouped or ungrouped. 

11.26. Our work is greatly simplified by the use of devices similar to 
those employed in calculating the means and other moments of univariate 
distributions. 

(a) We take working means for the two variates, obtained by inspec¬ 
tion, and transfer our moments to those about the means after the bulk 
of the arithmetic has been performed. For the first produet-moment 


1 Arrays in which the standard deviations are equal are sometimes said to be 
“homosceciastic”; in the contrary case “lieleroscedastie.” 

2 In generalisation of the definition of moments of a univariate distribution in 
Chapter 9 we may define the product-moments of a luxuriate universe as 

/<r, --I'Mfz'Vl 

where/is the frequency. This gives us 

f'n 

the quantity we have called p in equation (11.1). 




CORRELATION, ' 215 

we have, in fact, if £, tj are the deviations from the working means and 
£, 7 j the deviations of the true means from the working means : 


Hence, 


£rj = xy + £y 4- <rr/ 4- £rj 


Summing for all members of the universe, since S(£y) — £S(y) — 0 and 
similarly S(xrj) - 0, x and y being deviations from the true means, 

$>(£v) ^S(xy) +N£t} 

Hence, 

S(xy) = S(£rj) -N£rj, . . . (11.12) 

Tins gives us the produet-moment about the true means in terms of the 
product-moment about the working means and the deviations of the true 
moans from the working means. 

(b) As a cheek on the rather heavy arithmetic which is frequently 
involved, it is advisable to use a method similar to that of 8 . 10 . We have 


S(f + l)(i? + 1)-S(f,) FS(0 + S (v)+N . . (11.18) 


If, therefore, we calculate S(£ + 1 )(ry + 1) as well as we shall have in 

the above equation a check on the accuracy of our work. 

(c) Wc take the class-intervals as units and transfer to other units 
afterwards as desired. 

Example 11,1 , Table 11.8.— Lei us investigate the correlation and 
regressions of the variates of Table 11.7, the data of which are ungrouped. 
The variates are (1) the price index-number of animal feedmg-stuffs, X 9 
and (2) the price index-number of home-grown oats, Y. The values of 
the variates themselves are shown in columns 2 and 3 of Table 11.8. We 
take a working mean at X -90 and I r —90, and the deviations from these 
values are shown in columns 4 and 5. The remaining columns 0 to 13 
give the squares and product of the deviations together with the various 
auxiliary quantities used for cheeking purposes. Finally, the various 
sums are shown at the bottom of tlie table. 

In practice it is as well to show the negative values which may occur in 
columns 4, 5, 6, 7, 12 and 13 (particularly the last two) in a separate column, 
so as to facilitate addition and avoid mistakes. We have refrained from 
this course for convenience of printing. 

As cheek on the arithmetic we have: 


-118 ~S(£) -= S(£ + 1) -JV =- -58-60 
2924 -S(f+ 1) 2 -=S(£ 2 )4-2S(£) \ N --3100 - 230 + 60 


2493 - S(f fl)(^ 1) =S(£t?) 4 S(() 4 S (rj) +N 
= 2565 —118 -14+60 
= 2493 


etc., and 



216 


TIIEOBY OF STATISTICS 


Tabu® 11.8. —Correlation between Month!</ Index-numbers of Prices of (1) Animal 
Feeding-sluffs and (2) Home-grown Oats in Years 1U31-35. 


1. 

Month. 

2 

X. 

1 

4. 

£* 

5. 

>/• 

0 

£ bl. 

7. 

V 4 1 

8 

9. 

<; n) a . 

10. 

11 

Cl 11)“- 

12. 

13. 

(£+])( r/+l). 

1931 Jan. 

78 

84 

- 12 

- 0 

-11 

- 5 

144 

121 

30 

25 

72 

55 

Fob. 

77 

82 

-13 

- 8 

- 12 

- 7 

109 

in 

01 

49 

101 

84 

Mar. 

85 

82 

- 5 

- 8 

- 1 

- 7 

25 

10 

0 4 

49 

40 

28 

Apr. 

88 

85 

- 2 

- 5 

- 1 

- 1 

4 

1 

25 

10 

10 

4 

May 

87 

89 

- 3 

- 1 

— 2 

— 

9 

4 

J 

— 

3 

— 

June 

82 

90 

- 8 

— 

- 7 

1 

04 

49 

— 

1 

— 

_ 7 

July 

81 

88 

- 9 

— 2 

- 8 

- 1 

81 

04 

: 4 

1 

18 

8 

A up 

77 

92 

-13 

2 

12 

3 

F>9 

141 

4 

9 

-20 

— 30 

Sept. 

7b 

83 

-11 

— 7 

~]3 

- 0 

190 

109 

49 

30 

98 

78 

Oct. 

83 

89 

- 7 

— 1 

- 0, 

_ 

19 

30 

1 

— 

7 

— 

Nov. 

97 

98 

7 

8 

8 

9 

49 

01 

01 

81 

1 50 

72 

Fee. 

93 

99 

3 

9 

4 

10 

9 

10 

81 

100 

27 

40 

1932 Jan. 

95 

102 

rj 

12 

0 

13 

25 

30 

141 

109 

00 

78 

Feb. 

97 

102 

7 

12 

8 

13 

49 

*•1 

111 

109 

81 

104 

Mai. 

102 

105 

12 

15 

13 

]0 

1 1 1 

J t >9 

225 1 

250 

180 

208 

Apr. 

99 

105 

9 

15 

10 

10 

81 

100 

<>25 1 

250 

135 

100 

May 

97 

107 

7 

17 

8 

18 

49 

01 1 

289 j 

32 J 

119 

144 

.J une 

94 

107 

4 

17 

5 

18 

10 

25 

289 j 

321 

08 

90 

July 

94 

101 

4 

1J 

5 

12 

JO 


121 

141 

H 

00 

A up 

97 

100 

7 

It, 

8 

17 

49 

01 

250 i 

289 

112 

130 

Sept. 

92 

90 

o 

0 

3 

7 

4 

9 

30 

49 

12 

21 

Oct. 

89 

90 

- 1 

— 

— 

1 

I 

— 

— 

1 

— 

- 

Nov. 

90 

85 

— 

- 5 

1 

- 4 

-- 

1 

25 

10 


- 4 

Dec. 

90 

81 

— 

9 

1 

- 8 


1 

81 

04 


- 8 

1933 Jan, 

92 

84 

2 

- 0 

3 

_ 

4 

9 

30 

25 

-12 

-15 

Feb. 

91 

85 

1 

— 5 

o 

i 

1 

4 

25 

10 

- 5 

8 

Mar. 

90 

81 

; — 

- 0 

T 

5 

— i 

1 

30 

25 

— 

- 5 

Apr 

8(5 

81 

F 4 

- 9 

3 

i 8 

JO 

9 

81 

01 

30 

24 

May 

85 

70 

- 5 

- 1 1 

- 1 

- 13 

25 

10 

190 

It,9 

70 

52 

June 

85 

77 

| — 5 

- 13 

- 4 

12 

25 

10 

109 

144 

05 

48 

July 

85 

75 

— 5 

i -15 

- 4 

1 1 

25 1 

10 

225 

190 

75 

50 

Aug 

83 

79 

— 7 

11 

! - o 

10 

49 

3b 

121 

100 

77 

00 

Sept 

80 

78 

- 10 

i - 12 

9 

11 

JOO 

81 

141 

121 

120 

99 

Oct. 

78 

78 

- 12 

12 

-11 

n 

111 

121 

111 

121 

Ml 

121 

Nov, 

80 

7o 

10 

! -u 

- 9 

13 

JOO 

81 

19o 

109 

110 

117 

Fee. 

83 

75 

- 7 

-15 

- h 

11 

49 

30 

225 

190 

105 

84 

1934 Jan. 

82 

80 

- 8 

-10 

i 

- 7 

- j) 

01 

49 

JOO 

81 

80 

03 

Feb. 

83 

91 

- 7 

1 

- 0 

<> 

19 

30 

1 

4 

— 7 

-12 

Mar. 

85 

87 

- 5 

- 3 

- 4 

- 2 

25 

10 

9 

4 

15 

8 

Apr. 

83 

81 

- 7 

- 0 

0 

- 5 

19 

30 

30 

25 

42 

30 

May 

82 

81 

- 8 

- 9 

- 7 

8 

(> 1 

4!) 

8] 

04 

72 

50 

J une 

85 

83 

- 5 

- 7 

- 1 

- 0 

25 

10 

49 

30 

35 

24 

July 

88 

83 

— 2 

— 7 

- 1 

— 0 

"4 

1 

49 

30 

11 

0 

Aug 

101 

92 

11 

2 

12 

3 

J2J 

ill 

4 

9 

22 

30 

Sept. 

102 

98 

12 

8 

13 

9 

111 

109 

04 

81 

90 

117 

Oct 

98 

91 

8 

4 

9 

5 

0 4 

81 

10 

25 

32 

45 

Nov. 

Oh 

91 

0 

4 

7 

5 

3h 

49 

10 

25 

24 

35 

Dec. 

98 

95 

8 

5 

9 

0 

04 

81 

25 

30 

40 

54 

1935 Jan. 

98 

100 

8 

10 

9 

11 

01 

81 

NX) 

121 

80 

99 

Feb. 

92 

99 

2 

9 

3 

10 

4 

9 

81 

NX) 

38 

30 

Mar. 

92 

90 

2 

0 

3 

7 

4 

9 

30 

49 

12 

21 

Apr. 

90 

98 

— 

8 

1 

9 

— 

1 

04 

81 

— 

9 

May 

88 

97 

- 2 

7 

-- 1 

8 

1 

1 

49 

04 

-14 

- 8 

June 

80 

98 

- 4 

8 

- 3 

9 

10 

9 

04 

81 

-32 

-27 

July 

83 

99 

~ 7 

9 

— 0 

10 

19 

30 

81 

NX) 

- 03 

-00 

Aug. 

80 

92 

-10 

2 

- 9 

3 

loo 

81 

4 

9 

-20 

-27 

Sept. 

81 

90 

- 9 

— 

- 8 

1 

81 

04 

_ 

1 

— 

- 8 

Oct. 

80 

89 

- 4 

- 1 

- 3 

_ 

10 

9 

1 

_ 

4 

_ 

Nov. 

83 

87 

- 7 

- 3 

— 0 

— 2 

49 

30 

9 

4 

21 

12 

Dec. 

j 82 

83 

- 8 

- 7 

- 7 

- 0 

01 

49 

49 

30 

5(3 

42 

Total 

F 

— 

-lie 

1 -14 

-58 

40 

3100 

2924 

4814 

4840 

2505 

2493 




CORRELATION. 


217 


i 


We have, then, about the working means: 


19667 ■ 


14 
60 
3100 
60 

4814 


" 60 -- 0*2333 


”f 2 -47-7089, a* =6-914 

60 


2 - -rj z = 80-1789, cr„ -8-954 


P : 


= - fjj - 42-75 - 0-4589 = 42-2911 


N 


p 42-2911 „ . 

r - „ —— = + 0-08 


Further, working the regressions in the wav best to avoid errors in 
rounding off, 


&,= -0-527 

<V 

b« — — 9 =0-885 

rr z 


Thus the correlation coefficient is 0*68, and the regression equations, 
referred to the means, are : 

a? =0*527 y 

ij =0-885*' 

If we prefer to express these equations with origin at X ~ 0, Y =0, 
we haA e: 

X - (90-1-97) - X - 88*03 ^0*527(F -89*77) 

Y - (90 - 0-23) = Y - 89-77 =0-8 85(X - 88*03) 
which reduce to 

X = 0*527 F + 40*72 . . . . (a) 

F=0*885X +11*86 . . . . (b) 

The lines of regression are drawn on the scatter diagram of fig. 11.4. 

The standard errors made in using these equations to estimate the 
index-number of oats from animal feeding-stuffs, and vice versa , are: 

a x Vl -r 2 = 5-07 

cr v V 1 -r 2 = 6-57 


Equation (a) tells us that a rise of one point in the price index-number of 
oats is accompanied on the average by a rise of 0-527 point in the price index- 
number of feeding-stuffs. Similarly, equation (b) tells us that a rise of one 
point in the index for feeding-stuffs is accompanied on the average by a rise 
of 0*885 point in the price of oats. 

It is important to note that the regression equations do not tell us 



THEORY OF STATISTICS. 


218 

whether a variation in one variate is caused by a variation in the other; 
all we know is that the two vary together, and so far as the regression 
equations show, either the feeding-stuffs price may exert an influence on 
the oats price, or vice versa , or their common variation may be due to 
some other cause affecting both. This is only one instance of a diffi¬ 
culty which pervades the theory of correlation and regression, namely, 
that of interpreting results m tei ms of causal factors. 

Example 11.2 , Table 11.9.—Wo now consider an example based on 
grouped data. In this we have omitted the auxiliary quantities necessary 
tor checking in older to save space. 

(Unpublished elaia ; measurements by G. U. Yule.) The two variables 
are ( 1 ) X , the length of a mother-frond of duckweed (Lernna minor); 
( 2 ) Y, the length ol the (laughter-frond. The mother-frond was measured 
when the daughter-frond separatee! fiom it. and the daughtei-frond when 
its first daughter-frond sepaiated. Mcasuies wore taken fiom camera 
drawings made with the Zeiss-Abbe camera undo? a low power, the actual 
magnification being 24 . 1. The units of length m the tabulated measure¬ 
ments are millimetres on the drawings. 

The arbitrary origin for both X and Y was taken at 105 mm. The 
following are the values found for the constants of the single dis¬ 
tributions :— 

£= -1-058 intervals - - 6*3 mm. M x = 98 7 mm. on diawing 

4 11 mm. actual 

a x = 2*828 intervals = 17*0 mm. on drawing - 0 707 mm. actual 

rj= -0*203 interval- 1*2 mm. M z - 103*8 mm. on drawing 

4*32 mm actual 

< j v s* 3*084 intervals =* 18*5 mm. on drawing- 0 771 mm. actual 

To calculate S(£ 77 ) the value of £77 is first written m evcr\ compart¬ 
ment of the table against the corresponding frequency, tieating the class- 
interval as unit. In Table 11.9 freciuencies are shown 111 oidinary type 
and the values of £17 in heavy type. In making these entries the sign 
of the product may be neglected, but it must be remembeicd that this 
sign will be positive in the upper left-hand and lower right hand quadrants, 
and negative in the two others. The frequencies aie then collected, 
according to the magnitude and sign of £ 77 , 111 columns 2 and 3 of Table 
11 . 10 . When columns 2 and 3 are completed they should be checked 
to see that no frequency has been dropped, which may readily be done 
by adding together the totals of the two columns and the frequency 
in the 8 th row and 8 th column of Table 11.9 (the row and column for 
which £y~0), care being taken not to count twice the frequency m the 
compartment common to the two. This grand total must clearly be 
equal to N, the total number of observations, which in tins case is 26G. 
The numbers m column 4* are given by deducting the entries in column 3 
from those m column 2 . The totals so obtained are multiplied by grj 
(column 1) and the products entered in column 5 or 6 according to sign. 
The algebraic sum of these totals gives 


S(^) - +1519*5 





CORRELATION, 


219 


<» 


Table 11.10. 


1. 

2. 3. 

Frequencies, 

4. 

Total. 

5. 6. 

Products. 

+ 

Quadrants. 

Quadiants. 

+ 

- 

i 


8-6 

- 8-5 


8*5 

2 

17 

13-5 

+ 3-5 

7 

— 

3 

10 f> 

9 

+ 1-5 

4 5 

— 

4 

13 5 

6-5 

+ 7 

28 

—- 

5 

o 

0*5 

+ 1*5 

7*5 

— 

6 

13 5 

5 

+ 85 

51 

— 

8 

13 

1 

+ 12 

96 

— 

9 

9 

4 

+ 6 

45 

— 

10 

6 5 

1 

+ 55 

56 

— 

12 

17 6 

— 

+ 17 5 

210 

— 

n 

1 

— 

+ 1 

14 

— 

15 

6 

— 

+ 6 

90 

— 

16 

7 

— 

+ 7 

112 

_ 

18 

2 

— 

+ 2 

36 

— 

20 

8 

— 

+ 8 

1G0 

— 

21 

2 

— 

+ 2 

42 

— 

24 

6 

— 

+ 6 

144 

— 

26 

1 

— 

+ 1 

25 

_ 

28 

1 

— 

+ 1 

28 

— 

30 

3 

— 

+ 3 

90 

_ 

36 

1 

— 

+ 1 

36 

_ 

40 

1 

— 

+ 1 

40 

— 

42 

2 

— 

+ 2 

84 

— 

60 

1 

— 

+ 1 

60 

_ 

63 

1 


+ 1 

63 

— 

Totals 

145*6 

49 

71*5 

266 

49 


+ 1528 
- 8 5 

1519*5 

-8 6 


Hence, dividing by 266, 


Hence, 


^S(^) =5-712 

p - 5-712 - §rj -5*712 - 0-215 
— 5-197 


P 
r = 

cr cr cr i/ 


_ 5-197 
2^828 x 3-084 


+ 0-63 


The regression of daughter-frond on mother-frond is 0*G9 (a value 
which will not be affected by altering the units of measurement for both 
mother- and daughter-fronds, as such an alteration will affect both 
standard deviations equally). Hence, the regression equation giving the 




220 THEORY OF STATISTICS. 

average actual length (in millimetres) of daughter-fronds for mother-fronds 
of actual length X is 

Y — 1*48 + 0-69 A 

We leave it to the student to work out the second regression equation 
giving the average length of mother-fronds for daughter-fronds of length F, 
and to check the whole work by a diagram showing the lines of regression 
and the means of arrays for the central portion of the table. 

Example 11.3 , Table 11 . 2 .—The following device is frequently useful, 
and saves a considerable amount of labour in calculating the product 
term S (xy). 

We have: 

S(x-y) 2 = S(x 2 )-2S(xy) + S(y 2 ) . (i) 

and 

S(a? + y) 2 = S(a i2 ) + 2 S( t rz/) + S(?/ 2 ) . . . (ii) 

Hence, knowing S(^r 2 ) and S (y 2 ), we can find S(<r//) if we know either 
S(x -y) 2 or S(# + ;/) 2 . These quantities are often easier to calculate than 
S (xy) itself. 

Consider the data of Table 11 . 2 . In the usual way, taking a working 
mean centred in the intervals X = 25 years, Y =25- years, we have, in 
units of five years : 

#= +0-2924 rj= - 0-2353 

S(£ 2 ) -9708 S(ry 2 ) ~ 7090 

<7*=-1-780 (Xy = 1*481 

Now the value of £ -tj is constant down diagonals which run from the 
top left hand to the bottom right hand of the table. In fact, for the 
principal diagonal, running from X - 15 , F = 15 through X =- 20 , F - 20 
etc., £ - rj =0. For the diagonal above this, running from X -20 , Y - 15 
through A" —25-, F — 20 etc., £; ~rj — 1 , and so on. 

Let us then find the diagonal totals. We find : 



Frequency 

diagonal 

- 3 

4 

-2 

34 

-1 

280 

0 

1398 

1 

J051 

2 

203 

8 

73 

4 

31 

5 

12 

0 

5 

7 

2 


3153 


The total is the total frequency, which gives a cheek on the work. 

The value of S(£-t ]) 2 for the whole table is then obtained from the 
above table by squaring the values in the left-hand column, multiplying 



CORRELATION. 


221 


by the corresponding frequency in the right-hand column and adding. 
We get 

S(£ - rj) 2 = (9 x 4) + (4 x 34) + (1 x 280) + . . . + (49 x 2) 

= 4286 

Hence, from (i), 

4286=9708 4-7090 -2S(^) 

S(^) -6256 


whence 


6256 

P= iTl53 


-£r}= +2-0529 


p _ _2-0529 _ 

a~Oy ~ + 1-730 x 1-481 


=0-80 


The regression equations may now be obtained in the usual manner. 

In the above work we chose equation (i) in preference to equation (ii) 
because the frequencies are seen by inspection to run mainly from the 
top left hand to the bottom right hand of the table. Had they run from 
the top right hand to the bottom left hand wc should probably have 
found it better to use equation (ii). 

11.27. The student should be careful to remember the following 
points in working :— 

(1) To give S(^7 j) and £ij their correct signs in finding the true mean 
deviation product p. 

(2) To express and a v in terms of the class-interval as a unit, in the 
value of r^p/a x G U9 for these are the units in terms of which p has been 
calculated. 

(8) To use the proper units for the standard deviations (not class- 
intervals in general) in calculating the coefficients of regression : in forming 
the regression equation in terms of the absolute values of the variables, 
for example, as above, the work will be wrong unless means and standard 
deviations are expressed in the same units. 


Fluctuations of Sampling. 

11.28. Further, it must always be remembered that correlation 
coefficients, like other statistical measures, are subject to fluctuations of 
sampling. We shall consider this point at some length in later chapters 
(21 and 28), since the correlation coefficient has certain individual features 
which make it of special interest from the sampling point of view. We 
may, however, at this stage stress that if the number of observations is 
small, no significance can be attached to small, or even moderately large, 
values of r as indicating a real correlation in the universe from which the 
observations are drawn. For example, if N =86, a value of r ~ ± 0-5 may 
be a chance result, though a very infrequent one, in sampling from an 
uncorrclatcd universe, if iV = 100, r — J 0*8 may similarly be a mere 
fluctuation of sampling, though again a very infrequent one. The student 
should therefore be careful in interpreting his coefficients. 

Correction^ for Grouping. 

11.29. In this connection we may mention the question whether, in 
calculating the correlation coefficient from grouped data, any correction 



222 THEORY OF STATISTICS. 

is to be made analogous to the Sheppard correction for grouping which 
we have considered in the case of univariate data. In the examples 
considered in the foregoing we have not made such corrections. 

It appears that, when the distribution is reasonably symmetrical and 
obeys conditions similar to those enunciated in 8.11, page 141, we may, 
with advantage, correct the standard deviations a x > a V9 by applying to 
each the formula 

h 2 

a 2 (corrected ) =- a 2 - ^ - 

where h is the width of the interval. The product term S (ary) needs no 
such correction. 

We pointed out in 8.11, however, that sampling iluetuations usually 
obliterate any correction for grouping unless the size of the sample is large. 
It may, as before, be suggested that unless N = 1000 or more, it is hardly 
worth while making the correction. For example, in Tables 11.1-11.6, 
Tables 11.1, 11.5 and 31.6 have a frequency less than 1000 and the correc¬ 
tions are not to be applied—in any case they would not be applied to 
Tables 11.5 and 11.6, which violate the conditions as to u tapering off.” 

11.30. Finally, it should be borne in mind that any coefficient, e.g. 
the coefficient of correlation or the eocilicient of contingency, gives only a 
part of the information afforded by the original data or the correlation 
table. The correlation table itself, or the original data if no correlation 
table has been compiled, should always be given, unless considerations of 
space or of expense absolutely preclude the adoption of such a course. 


SUMMARY. 

1. A universe every member of which bears one of the values of each 
of two variates is said to be bivariate. If the members are grouped 
according to class-intervals of the two variables, we have a bivariate 
frequency-distribution. 

2. The bivariate frequency-distribution may be represented by a 
frequency-surface or by a stereogram. Ungrouped data (and, less con¬ 
veniently, grouped data) can be represented on a scatter diagram. 

3. The means of arrays of a bivariate frequency-distribution may be 
represented as points by reference to a pair of rectangular axes along 
which are measured values of the variables. The means of rows and 
those of columns will in general he respectively about two smooth curves, 
called lines of regression. The equations of these curves are called 
regression equations. 1 

4. The regression equations may be regarded as expressions for 
estimating from a given value of one variate the average corresponding 
value of the other. 

5. The coefficient of correlation (produet-moment correlation co¬ 
efficient) between two variables X and Y is given by : 


1 Curvilinear regression lines, like straight regression lines, may also be defined for 
ungrouped data by an extension of the principle of making sums of squares of errors of 
estimate a minimum. 



CORRELATION. 


223 


- s ^y) 
Vs(^)sT^) 

p_ 

°x a v 


where x, y are the values of the variables measured from their respective 

i S (ary) 

means, and p ~ —• 


0. The correlation coefficient r cannot be less than -1 or greater 
than +1. If r— J 1 the variables are perfectly correlated, the points 
corresponding to pairs of values x, y all lying on a straight line. If 
r - -1 the variables are perfectly negatively correlated, low values of 
one ( oi responding to high values of the other. If r-= + 1 the variables 
arc perfectly positively correlated, high values of one corresponding to 
high \ allies ol the other. 

7. The linear regression equation of X on Y (referred to axes through 
then respective means) is 


w here 


and that of Y on X is 


x-bpj 


f* t 




where 


rcr v p 




b i and b. z being called coclficients of regression, or simply regressions. 

8. The straight lines of regression are such that the sums of squares 
of errors ol estimate, S (x~b l y) 2 and S(i/- b 2 x) 2 , are a minimum. If the 
quotients of these sums by N are denoted by s x 2 9 s y 2 9 

s v 2 -=o y 2 ( 1 - r 2 ) 


EXERCISES. 

11.1. Find the correlation coefficient and the equations of regression for the 
following values of X and Y 

A. Y. 

2 
5 
3 
8 
7 


1 

2 

3 

4 

5 



224 THEORY OF STATISTICS. 

[As a matter of practice it is never worth calculating a correlation coefficient 
for so few observations: the figures are given solely as a short example on 
which the student can test his knowledge of the work.] 

11.2. (Data from W. Little: Labour Commission Report, Vol. 5, Part 1, 1894, 
anft, Official Returns.) 

The following figures show (1) the estimated average earnings of agricultural 
labourers, X , (2) the percentage of population in receipt of poor law relief, Y, 
(8) the ratio of the number of paupers receiving outdoor relief to the number 
receiving relief in workhouses, Z, for certain districts in England and Wales in 
1898. 

Find the correlations between A' and Y, Y and Z, and Z and X. Draw 
scatter diagrams to illustrate the various joint distributions. 


Union. 

Estimated 
Average Earnings 
of Agricultural 
Labourers. 
Shillings and 
Pence per Week. 

Percentage of 
Population in 
Receipt of 
Poor Law 
Relief. 

Ratio of Number 
of Paupers i 

Receiving ' 

Outdoor Relief 
to the Number 
Receiving Relief 1 
in Workhouses. 

1 

1. Glendale. 

s. d. 

20 9 

2-40 

6 10 

2. Wigton . 

20 3 

2-29 

404 

3. Garstang 

19 8 

1-38 

7*90 

4. Belpor 

18 6 

1 92 

3 31 

5. Nantwich 

17 8 

2-98 

7*85 

6. Atchain . 

:7 6 

117 

0 45 

7. Driffield . 

17 1 

3-79 

10 00 

8. Uttoxetor 

17 0 

3*01 

4*43 

9. Wetherby 

17 0 

2-89 

4 78 

10. Easmgwold 

16 11 

2 78 

4-78 

11. Southwell 

10 6 

3 09 

600 

12. Hollmgbouin 

16 4 

2 78 

1 *22 

13. Melt on Mo w bi a > 

16 3 

2 61 

4 27 

14. Truro 

16 3 

4 33 

7*50 

15. Godstono 

10 0 

3 02 

4 44 

16. Louth 

! 16 0 

4 20 

8-34 

17. Brixworth 

15 9 

1-29 

0*69 

18. Croditon . 

15 8 

5 16 

9*89 

19. Holboaeli 

15 6 

4-7 6 

4*00 

20. Maldon . 

1 15 6 

4 61 

6 02 

21. Monmouth . . i 

i 15 4 

4-26 

8 27 

22. St. Neots . . 1 

15 3 

1 66 

1 58 

23. Swaffham 

15 0 

5 37 

10-04 

24. Thakeham 

15 0 

3 38 

1 96 

25. Thame . 

15 0 

5-84 

9-28 

26. Tliingoe . 

15 0 

4*63 

8 72 

27. Basingstoke 

15 0 

3 93 

2 97 

28. Cirencestci 

15 0 

4 51 

5 38 

29. North Witchford 

14 10 

3-42 

3 24 

30. Powsey . 

14 9 

5*88 

7 61 

31. Bromyard 

14 9 

4-36 

5-87 

32. Wantage. 

14 9 

3-85 

5*50 

33. Stratford-on-Avon . 

14 7 

3-92 

3 58 

34. Dorchester 

14 6 

4-48 

6 93 

35. Woburn . 

14 6 

5*67 

6-02 

36. Buntingford . 

14 4 

4-91 

4-92 

37. Pershore. 

13 6 

4-34 

4-64 

38. Langport 

12 6 

5*19 

10*56 

L~ — 











CORRELATION. ? 225 

11.3. Verify the following data for the under-mentioned tables of this 
chapter. Calculate the means of rows and columns and draw a diagram showing 
the lines of regression for the data of Table 11.1. (Sheppard’s correction used 
only in Table 11.4.) ^ 



11.1. 

11.3. 

11.4. 

11.6. 

Mean of X 

55*3 mm. 

67*70 io. 

6*22 years 

509*2 

„ „ y • * r * 

Standard deviation of A 

53*1 „ 

68*60 „ 

18 61 galls. 

14,500 

6*86 „ 

2*72 „ 

2*21 years 

7*46 

. y ■ 

5*77 „ 

2*75 „ 

3*37 galls. 

18,100 

Coefficient of correlation . 

-f 0*97 

+ 0*51 

+ 0*22 

-0*014 

Coefficient of contingency . 
(for the grouping stated > 

0 90 

0*51 

0 26 

0*47 

below). j 






Tn calculating the eoellicient of contingency (coefficient of mean square 
contingency) use the following groupings, so as to avoid small scattered fre¬ 
quencies at the extremities of the tables and also excessive arithmetic:- - 

Table 11.1. Group together (1) two lop rows, (2) three bottom rows, (3) two 
first columns, (1<) four last columns, leaving centre of table as it stands. 

Table 11.3. Regroup by 2-inch intervals, 58*5 GO 5, etc., for father, 50 5-61*5, 
etc., for son. If a 3-inch grouping be used (58*5-01*5, etc., for both father and 
son), the coefficient, of mean square contingency is 0 105. (Both results cited 
from Pearson, ref. (84).) 

Table 11.4. For columns, group those headed 3 and 4, 5 and 0, 7 and 8, 9 and 
10, 11 and over; for rows, group those headed 8-11, 12-13, 14-15, 10-17, 18-19, 
20 21, 22-23, 24 -25, 20-27, 28 and over. 

'fable 11.0. For columns, group all up to 494 5 and all over 521*5, leaving 
central columns. Rows, singh up to 20: then 20 28, 28-41, 41- 50, 50 upwards. 

11.4. (Data from Statistical Review of England and Wales for 1933, Tables, 
Part 1. p. 3, and Part 2, p. G.) The following show mean annual birth and death 
rates in England and Wales for quinquennia since 1870. Find the correlation 
between birth and death rates. 


Period. 

Mean Annual 

Live JBirth Hate 
per 1000 of Population. 

Mean Annual 

Death Hate 

per 1000 of Population. 

1876-80 

35 3 

20-8 

1881-85 

33*5 

19*4 

1886-90 

31*4 

18*9 

1891-95 

30*5 

18*7 

1896-1900 

29 3 

17*7 

1901-1905 

28 2 

16*0 

1906-1910 

26*3 

14*7 

1911-15 

23*6 

14*3 

1916-20 

20*1 

14*4 

1921-25 

19*9 

12*2 

1926-30 

| 16*7 

12*1 


15 




226 


THEORY OF STATISTICS. 


11.5. The following figures (S. Rowson, Journ . Roy . Stat. Soc. t vol. 99, 1936) 
give the relationship between the density of population and seating capacity of 
cinemas in various districts of Great Britain. 

Find the correlation between density of population and proportion of cinemas 
with (1) seating capacity 500 or less, (2) seating capacity 2000 or more. 



! 

Density of 
Population 
per square mile. 

Percentage of Cinemas. 

District. 

(i) 

Seating 500 
or less. 1 

j 

(2) 

Seating 2000 
or more. 

Scotland ..... 

103 

13 4 

4*3 

.North Wales . . . . . i 

165 

1 42 5 

0 0 

West of England . . 

• 3S0 

I 38*2 

2 1 

Eastern Counties 

431 

38 8 1 

1 3 

South Wales .... 

440 

! 22 ( 1 

1 2 

North ol England . 

487 

16 0 

1 2 

Yoikwluie and district 

594 

15*5 

3 l 

Midlands ... 

710 

20 2 

l 6 

Home Counties (excluding London) . i 

7f4 

28 2 ' 

3 0 

Lancashire 

2157 

13*5 

| 3 6 

1 


11.6. Show that the eoeOieient of correlation is the geometiic mean of the 
coefficients of regression; verily from the data of Examples 11.3, 11.2 and 11.3 
that the arithmetic mean ol the cocllicients of regression is greater than the 
coefficient of correlation. 

dl.’ft The tangent of the difference of angles A and B is given by 


1 an (A - li) 


tan * I - tan B 
1 +tan A tan B 


Deduce that the smaller angle between regiession lines is 0 , given by 


tan 0 — 


1 


- r 3 (7 f (7 y 

r a/ f o„ 2 


and interpret this result when r ■— 0 and r = ± 1. 




CHAPTER 12. 


NORMAL CORRELATION. 

The Bivariate Normal Surface. 

12.1. Our study of the normal curve in Chapter 10 may bo extended 
to yield a corresponding expression for the fmpiency-distribution of pairs 
of \ alues of two variates. This bivariate normal distribution, known also 
as “the bivariate normal surface/' “the normal correlation 
surface” or simply “the normal surface,” occupies a central position 
in the theory ol bivariate frequency-distributions, and bears to them a 
relation similar to that borne by the normal curve to the frequency- 
distributions of a single variate. 

The normal surface is of great histoneal importance, as the earlier 
work on correlation is, almost without exception, based on the assumption 
of such a distribution ; though when it w r as recognised that the properties 
of the correlation coefficient could be deduced, as in Chapter 11, without 
lcfercncc to the form of the distribution of frequency, a knowledge of this 
special type of frequency-surface ceased to be so essential. But the 
generalised normal law is of importance in the theory of sampling : it 
serves to describe very approximately certain actual distributions (e.g. of 
measurements on man); and ll it can be assumed to hold good, some of the 
expressions in the theory of correlation, notably the standard deviations 
of arrays (and, if more than two variables are involved, the partial correla¬ 
tion coefficients), can be assigned more simple and definite meanings than 
m the general ease. The student should, therefore, be familiar with the 
more fundamental properties of the distribution. 

12.2. Consider first the ease in which the two variables are com¬ 
pletely independent. Let the distributions of frequency for the two 
\ anabks and t r 2 singly be gi\ on by 


tfa y* e ' 

ihen, assuming independence, the frequency-distribution of pairs of values 
must, by the rule of independence, be given by 


where 





( 12 . 2 ) 


s ?iV a N 

N 27ra 1 (7 ji 


227 


(12.3) 



228 


THEORY OF STATISTICS* 


Equation (12.2) gives a normal correlation surface for one special case, the 
correlation coefficient being zero. If we put .r 2 — a constant, wc see that 
every section of the surface by a vertical plane parallel to the tTj-axis, ix, 
the distribution of any array oKr/s, is a normal distribution, with the same 
mean and standard deviation as the total distribution of t r x 's ; and a similar 
statement holds for the arrays of these properties must hold good, 

of course, as the two variables arc assumed independent ((). 5.18). The 
contour lines of the surface, that is to say, lines drawn on the surface at a 
constant height, are a series of similar ellipses with major and minor axes 
parallel to the axes of <r, and or 2 and proportional to a x and the equations 
to the contour hues being of the general form 



(12.4) 


Pairs of values of <Tj and d\ related by an equation of this form are, therefore, 
equally frequent 

12.3. Now suppose we ha\ e two correlated \ aviates iT i and as, and let 
the regression of cTj on ,r 2 be b v > and that oi «r 2 on a\ be b 2l . Let be the 
coefficient of correlation between and a z . 

Consider the new r variates defined by the equations 

lf i a ~ a i ~ ^12 J 2 
il 2 ] d 'l " ^2C r i 


This is a notation which we shall later extend considerably. 
Then *r 1 and x z x are uncorrclatcd, as me a z and d\ r 
For 

S (a 1^ 2 1) ~ S I 1 1 (d 2 - b zl d j )! 

-S (a v T 2 ) ~b 2l % tl )* 


l) 


' ^ J2* 7 r 1 <7 j„ _ 


and similarly for S (d*i l2 )> 

Writing <jj, <r 2 for the standard deviations oi t r,, 
standard deviation a l a of ac x 2 is given by 


wc SCI 


that the 


a 


2 

1 2 




= {o\- i ib u r l . i v l o, i +b\,<jl) 

= W~2 rlpl+^ol) 

- "?(! -rid 


and similarly a 2 l the standard deviation of jt 2L is given by 

a 2 1 ==0r K 1 _r i 3 ) 

We obtained these results in a slightly different form in 11.22 and 
11.24. 



NOBMAL COBRELATION. 229 


12.4. Suppose further that x x and ce 2 i are not only uneorrelatcd, but 
independent, and that each is normally distributed. 

In accordance with equation (12.2), we must have for the frequency- 
distribution of pairs of deviations of oc x and x 2 t 


But 


V n^V\2 e 




• (12.5) 


a* 


s',or. 


^2 1 ,T 1 

' <ii <T*(1 -rf„) + orj( 1 ~r\.) -ri2 <T,or a (l - f^) 


*| *3 

9 . ^ 


O’", 


_ o r 
'12 


^1 9^9 


Evidently we should also have arrived at precisely the same expression 
if we had taken the distribution of frequency for and x x 2 , and reduced 
the exponent 


We have, therefore, the general expression for the normal correlation 
surface for two variables : 


Vn^yW' 



XlX * 

- ~ r n — 
*t l.gag 


.) 


( 12 . 0 ) 


Further, since x 1 and ,r 2 ,, cr 2 and 2 , are independent, we must have: 


, JV N N 

Vli 2itct 1 ct 21 27TCT.,cr 12 27Tff 1 a 2 (l -r* 2 )‘ 

Expressing a 12 and cr 21 in terms of cr v a z and r 12 , we have the 
alternative form 


y 12 = 


N 


20 


1 Jx l 2r 12 ziTi 
a i°* 


27TO r 1 Or 2 V 1 


7 12 


. (12.8) 


Properties of the Normal Surface. 

12.5. For any given value h z of ,r 2 the distribution of the array of 
a^’s is given by 

-i 

Vn - '/iV 

_4 (**-’•<%)* 

*=2/ 12 e <? 


K 

+ a ~2r ja 
2 *>1 


<r, ,/ 




230 THEORY t)F STATISTICS. 

This is a normal distribution of standard deviation < 7 , 0 * with a moan 

deviating by r l2 ^ l h 2 from the mean of the whole distribution ofay’s. 

°2 

Hence, since h 2 may be any value, wc have the important results : 

(1) that the standard deviations of all arrays of a\ are the same, and 

equal to d 1 2 ; 

(2) that the regression of a\ on r 2 is strictly linear. 

Similarly, it follows that the s.d.’s of all arrays of t r 2 are equal to a 2V 
and that the regression of a\ on ,rj is linear. 

12.6. The contour lines are, as in the ease of independence, a series 
of concentric and similar ellipses ; the major and minor axes are, however, 



Fig. 12.1,—Principal Axes and Contour Lines of flu* Normal 
Correia! ion Surface. 


no longer parallel to the axes of x x and ,r 2 , but make a certain angle with 
them. Fig. 12.1 illustrates the calculated form of the contour lines for 
one case, RR and CC being the lines of regression. As each line of re¬ 
gression cuts every array of x x or of in its mean, and as the distribution 








NORMAL CORRELATION. 


281 


* 


of every array is symmetrical about its mean, RTi must bisect every 
horizontal chord and CC every vertical chord, as illustrated by the two 
chords shown by dotted lines ; it also follows that Rlt cuts all the ellipses 
in the points of contact of the horizontal tangents to the ellipses, and CC in 
the points of contact of the vertical tangents. The surface or solid itself, 
somewhat truncated, is shown in fig. 11.1, page 204. 

12.7. Since, as we sec from fig. 12.1, a normal surface for two correlated 
variables may be regarded merely as a certain surface for which r is zero 
turned round through some angle, and since for every angle through which 
it is turned the distributions of all j 1 ! arrays and <r 2 arrays are normal, it 
follows that every section of a normal surface by a vertical plane is a normal 
curve, ?.c. the distributions of arrays taken at any angle across the surface 
are normal. 


12.8. It also follows that, since the total distributions of x 1 and x % 
must be normal for every angle through which the surface is turned, the 
distributions of totals given by slices or arrays taken at any angle across a 
normal surface must be normal distributions. Hut these would give the 
distributions of functions like nx { r , for 2 , and consequently (1) the dis¬ 
tribution of any linear function of two normally distributed variables x l 
and x, must also be normal ; (2) the correlation between any two linear 
functions oft wo normally distributed variables must be normal correlation. 

Result (1) is very important, and may easily be extended to 
cover the ease of n variables x t . . . x n . Suppose, in fact, we have 
n such variables each of which is normally distributed, and a linear 
function ax { f bx 2 + . . . +hx n . Since <7,7*, +bx> is normally distributed, 
(ax l \-bx z ) is normally distributed, and hence so is (r/<r, + br 2 4 cx :i )+dx i9 
and so on. Thus the function ax l t . . . + hx }) is normally distributed. 

Hence, the sum of n normal variates is distributed normally; and in 
particular the mean of n normal variates is distributed normally. More 
particularly still, the mean of samples of n from a normal universe is 
normally distributed. 

12.9. Returning to the normal surface, it is interesting to inquire 
what is the angle 0 through which the surface has been turned from the 
position for which the correlation was zero. The major and minor axes 
of the ellipses are sometimes termed the principal axes. If ^ 2 be 
the co-ordinates referred to the principal axes (the £ r axis being the 
.lyaxis in its new position), we have for the relation between £ l5 £ 2 , x [9 x 2 , 
tlie angle 0 being taken as positive for a rotation of the aq-axis which will 
make it, if continued through 90°, coincide in direction and sense with the 


- x x cos 0 {- x 2 sin dl 
£ 2 -=rr 2 cos 0 -x 1 sin 0 f 


(12.9) 


Rut, since £ 2 are uncorrelated, S(£jf 2 ) ===-0. Hence, multiplying together 
equations (12.9) and summing, 


0 = (cr 2 2 - o'! 2 ) sin 2 0 + 2r 12 o' 1 o' 2 cos 20 


tan 2 0^ 2r l~ (Tia l .... (12.10) 

(Tj 2 - cr 2 2 

It should be noticed that if we define the principal axes of any distribution 
for two variables as being a pair of axes at right angles for which the 



282 


THEORY OF STATISTICS, 


variables £ v £ 2 are uneorrelated, equation (12.10) gives the angle that they 
make with the axes of measurement whether the distribution be normal 
or not. 

12.10. The two standard deviations, say S x and S 2 , about the 
principal axes are of some interest, for evidently from 12.2 the major and 
minor axes of the contour ellipses are proportional to these two standard 
deviations. They may be most readily determined as follows. Squaring 
the two transformation equations (12.9), summing and adding, we have: 

S, 2 + S 2 2 -a, 2 ! a 2 2 . . . • (12.11) 

Referring the surface to the axes of measurement, we haw for the central 
ordinate, by equation (12.7), 

, ^ N 
■ Ju ~ ‘_ > 77(T 1 cr 2 0 - r'f,Y 


Referring it to the principal axes, by equation (12.8), 

, A 7 
?/l2 27rS 1 S 2 

But these two values of the central ordinate must be equal, therefore 

S 1 S 2 --(t 1 o 2 (1 -r^)* . . (12.12) 


(12.11) and (12.12) arc a pair of simultaneous equations from which S x and 
S 2 may be very simply obtained in any arithmetical east 1 . Care must, 
however, be taken to give the comet signs to the square root in solving. 
S 1 + S 2 is necessarily positive, and Sj - S 2 also if / is positive, the major 
axes of the ellipses lving along ; but if / be negative, S x - S 2 is also 
negative. It should be noted that, while we have deduced (12.12) from 
a simple consideration depending on the normality of the distribution, it 
is really of general application (like equation (12.11)), and may be obtained 
at somewdiat greater length from the equations for I ra nsformmg co-ordinates. 

12.11. As an example of the application oj tlie foregoing theory to 
a practical case, w^e proceed to consider the distubution of Table 11.3, 
page 199, showing the correlation between stature of father and son, and 
to test, as far as we can by elementary methods, whether a normal surface 
will fit the data. 

12.12. The first important property of the normal distribution is the 
linearity of regression, liiis was well illustrated for these data in fig. 11.8 
(p. 211). Subject to some investigation as to the deviations from strict 
linearity which may occur as the result of sampling fluctuations, w^e may 
conclude that the regression is appreciably lineat. We shall consider a 
test of linearity in later chapters (sec Chapter 28). 

12.13. The second important property is the constancy of the 
standard deviation for all parallel arrays. 

The standard deviations of the ten columns fioin that headed 62-5-68*5 
onwards are : 

2*56 2-60 

211 2-26 

2-55 2-20 

2 21 2 15 

2*28 2*88 



NORMAL CORRELATION. 


238 


the mean being 2*80. The standard deviations again only fluctuate 
irregularly round their mean value. The mean of the first five is 2*84, of 
the second five 2*88, a difference of only 0 04 ; of the first group, two are 
greater and three are less than the mean, and the same is true of the second 
group. There does not seem to be any indication of a general tendency 
for the standard deviation to increase or decrease as we pass from one end 
of the table to the other. We are not yet in a position to test how far the 
differences from the average standard deviation might have arisen in 
sampling from a record in which the distribution was strictly normal, but, 
ns a fact, a rough test suggests that they might have done so. 

12.14. Next we note that the distributions of all arrays of a normal 
surface should themselves be normal. Owing, however, to the small 
numbers of observations in any array, the distributions of arrays are very 
irregular, and their normality cannot be tested in any very satisfactory 
way ; we can only say that they do not exhibit any marked or regular 
nsymmetry. But we can test the allied property of a normal correlation 
table, viz. that the totals of arrays must give a normal distribution even 
if the arrays be taken diagonally across the surface, and not parallel to 
either axis of measurement. From an ordinary correlation table we 
cannot find the totals of such diagonal arrays exactly, but the totals of 
arrays at an angle of 45° will be given with sufficient accuracy for our 
present purpose by the totals of lines of diagonally adjacent compartments. 
Referring again to Table 11.8, and forming the totals of such diagonals 
(running up from left to right), we find, starting at the top left-hand corner 
of the table, the following distribution : - 


0*25 

78*75 

2 

81*25 

8*25 

60*5 

0*25 

59*25 

8 

42*25 

9*75 

80*75 

17 

29*25 

84*5 

19 

42 

10*75 

40*25 

7 

00*5 

4*25 

07*5 

8*5 

85*75 

1*75 

87*25 

1 

78 

0*25 

94*25 

- 


Total 1078 


The mean of this distribution is at 0*859 of an interval above the centre of 
t he interval with frequency 78; its standard deviation is 1*757 intervals, or, 
remembering that the interval is 1/V2 of an inch, 8-8G4 inches. (This 
value may be cheeked directly from the constants for the table given in 
Exercise 11.8, page 225, for we have, from the first of the transformation 
equations (12.9), 

cq 2 » ai 2 cos 2 6 + ay sin 2 6 + 2r 12 cr 1 cr 2 sin d eos 6 



284 


THEOBY OF STATISTICS. 


and inserting cq—2-72, cr 2 ±=2-75, r 12 —0*51, sin 0 = cos 0»1/V2, find 
cq = 8-361.) Drawing a diagram and "fitting a normal curve, we have 
fig. 12.2 ; the distribution is rather irregular but the fit is fair ; certainly 
there is no marked asymmetry, and, so far as the graphical test goes, the 
distribution may be regarded as appreciably normal. One of the greatest 
divergences of the actual distribution from the normal curve occurs in the 
almost central interval with frequency 78; the difference between the 
observed and calculated frequencies is here 12 units, but nevertheless it 



Fig. 12.2. Distribution ot Kiequeney obtained by Addition of Tabic 11.a along 
Diagonals running up from left to right, fitted >vith a Normal Curve. 


may well have occurred as a fluctuation of sampling. In fact, anticipating 
our discussion of the use of the standard error (standard deviation of 
simple sampling) in testing the significance of sampling fluctuations 
( 19 . 4 ), we may note that the standard error in this ease is Vwpq, where 
n is the number of observations and p and q the chances of an individual 
falling or not falling within t he given interval, p may be taken as 90/1078, 
and therefore the standard error is 



J90 ^ 988 
1078 1078 


9-1 


The observed deviation, 12, is not much greater than this and may there¬ 
fore have occurred as a sampling fluctuation. We have used here the 
exact expression for the standard error, but since p is small we might 
have used the approximation Vpn~V 90 =--9-5. This last is useful as 
giving a test which can be applied on sight. 

12 . 15 . So far, we have seen ( 1 ) that the regression is approximately 
linear; (2) that, in the arrays which we have tested, the standard 
deviations are approximately constant, or at least that their differences 
are only small, irregular and fluctuating ; (3) that the distribution of 
totals for one set of diagonal arrays is approximately normal. These 
results suggest, though they cannot completely prove, that the whole 
distribution of frequency may be regarded as approximately normal, 




NORMAL CORRELATION. 


235 


within the limits of fluctuations of sampling. We may therefore apply a 
more searching test, viz. the form of the contour lines and the closeness 
of their fit to the contour ellipses of the normal surface. It may, however, 
be seen that no very close fit can be expected. Since the frequencies in 
the compartments of the table are small, the standard error of any 
frequency is given approximately by its square root (19.15), and this 
implies a standard error of about 5 units at the centre of the table, 8 units 
for a frequency of 9, or 2 units for a frequency of 4 : fluctuations of these 
magnitudes are quite possible and might cause wide divergences in the 
corresponding contour lines. 

12.16. Using the suffix 1 to denote the constants relating to the 
distribution of stature for fathers, and 2 the same constants for the sons, 

# = 1078 M l - 67-70 AT, = 68-06 

cr, - 2-72 a" 2-75 “ W ' J| 

Hence we have from equation (12.7), 

*4=26-7 

and the complete expression for the fitted normal surface is 

/ x\ sr\ ri-rjA 
y _ 20*7# ^'0 47 *"5 00 5*43/ 

The equation to any contour ellipse will be given by equating the index 
of e to a constant, but it is very much easier to draw the ellipses if wc refer 
them to their principal axes. To do this we must first determine 0, Sj 
and S 2 . From (12.10), 

tan 20 - - 16*49 

whence 20 = 91° 14', 0 = 15° 87', the principal axes standing very nearly 
at an angle of 45° with the axes of measurement, owing to the two standard 
deviations being very nearly equal. They should be set off on the diagram, 
not with a protractor, but by taking tan 0 from the tables (1-022) and 
calculating points on each axis on either side of the mean. 

To obtain S x and S 2 w r e have, from (12.11) and (12.12), 

S^ + S* 2 -14-961 
2SiS 2 = 12-868 

Adding and subtracting these equations from each other and taking the 
square root, 

Sj h S 2 - 5*275 
S, -S 2 = 1-117 

whence S t - 8-86, S 2 1-91 ; owing to the principal axes standing nearly 
at 45° the first value is sensibly the same as that found for cre in 12.14. 
The equations to the contour ellipses, referred to the principal axes, may 
therefore be written in the form 

fl* + 

(8-86) 2 (1-91 ) 2 



236 


THEORY OF STATISTICS. 


the major and minor semi-axes being 3*30 x c and 1 -91 x c respectively. To 
find c lor any assigned value of the frequency y we have: 

Jc2 

/2 *(^gy; 12 ~logy 12 ) 

log e 

Supposing that we desire to draw the three contour ellipses for y — 5, 
10 and 20, we find c-1-83, 1-40 and 0 70, or the following values for the 
major and minor axes of the ellipses : semi-major axes, 6*15, 4*70, 2*55 ; 
semi-minor axes, 3-50, 2-67, 1*45. The ellipses drawn with these axes 
are shown m fig. 12.3, very much reduced, ol course, from the original 



Stature of Father . inches 


Fig. 12.3.- Contour Lines for the Frequencies 5, 10 and 20 of the Distribution of 

Table 11.3, and ooi respond mg Contour Ellipses of the Fitted Normal Surface. 

P X V j, PJ J >, principal axes; M , mean. 

drawing, one oi the squares shown representing a square' inch on the 
original. The actual contour lines for the same frequencies are shown 
by the irregular polygons superposed on the ellipses, the points on these 
polygons having been obtained by simple giaphieal interpolation between 
the frequencies in each row and each column - diagonal interpolation 
between the frequencies in a row and the frequencies in a column not 
being used. It will be seen that the fit of the two lower contours is, on 



NORMAL CORRELATION. 


287 

the whole, fair, especially considering the high standard errors. In the 
case of the central contour, y ^ 20, the fit looks very poor to the eye, but 
if the ellipse be compared carefully with the table, the figures suggest 
that here again we have only to deal with the effects of fluctuations of 
sampling. For father's stature—6G in., son's stature =-70 in., there is a 
frequency of 18*75, and an increase in this much less than the standard 
error would bring the actual contour outside the ellipse. Again, for 
father's stature—08 in., son’s stature-71 in., there is a frequency of 19, 
and an increase of a single unit would give a point on the actual contour 
below the ellipse. Taking the results as a whole, the fit must be considered 
quite as good as we could expect with such small frequencies. 

It is perhaps of historical interest to note that Sir Francis Galton, 
working without a knowledge of the theory of normal correlation, sug¬ 
gested that the contour lines of a similar table for the inheritance of 
stature seemed to be closely represented by a series of concentric and 
similar ellipses (ref. (250)) : the suggestion was confirmed when he handed 
the problem, in abstract terms, to a mathematician, J. I). Hamilton 
Dickson (ref. (252)), asking him to investigate “ the Surface of Frequency 
of Error that would result from these data, and the various shapes and 
other particulars of its sections that wort' made by horizontal planes.” 

Isotropic Character of the Normal Surface. 

12.17. The normal distribution of frequency for two variables is 

an isotropic distribution, to which all tlie theorems of 5.16 apply. 
For if we isolate the four compartments of the correlation table common 
to the rows and columns centring round values of the variables 
(Tj, 0 ’ 2 , ir/, <r 2 ', w ? e have for the ratio of the cross-products (frequency of 
tV v r 2 multiplied by frequency of divided by frequency of 

multiplied by frequency of ^ A \r 2 ), 

ru Ul •'lX-'V~ *i> 

Assuming that - a? A has been taken of the same sign as oo 2 ~«r 2 , the 
exponent is of the same sign as r A2 . Hence, the association for this group 
of four frequencies is also of the same sign as r 12 , the ratio of the cross- 
products being unity, or the association zero, if r 12 is zero. In a normal 
distribution, the association is therefore of the same sign—the sign of 
?’ 12 —for every tetrad of frequencies in the compartments common to 
two rows and two columns ; that is to say, the distribution is isotropic. 
It follows that every grouping of a normal distribution is isotropic whether 
the class-intervals are equal or unequal, large or small, and the sign of the 
association for a normal distribution grouped dow r n to 2 x 2-fold form 
must always be the same whatever the axes of division chosen. 

12 . 18 . These theorems are of importance in the applications of the 
theory of normal correlation to the treatment of qualitative characters 
which are subjected to a manifold classification. The contingency tables 
for such characters are sometimes regarded as groupings of a normal 
distribution of frequency, and the coefficient of correlation is determined 
on this hypothesis by a rather lengthy procedure (see below, 13 . 23 , 
page 251). Before applying this procedure it is well, therefore, to see 



288 


THEORY OF STATISTICS. 


whether the distribution of frequency may be regarded as approximately 
isotropic, or reducible to isotropic form by some alteration in the order 
of rows and columns (5.16 and 5.17). If only reducible to isotropic 
form by some rearrangement, this rearrangement should be effected before 
grouping the table to 2- x 2-fold form for the calculation of the correlation 
coefficient by the process referred to. If the table is not reducible to 
isotropic form by any rearrangement, the process of calculating the 
coefficient of correlation on the assumption of normality is to be avoided. 
Clearly, even if the table be isotropic it need not bemormal, but at least 
the test for isotropy affords a rapid and simple means for excluding certain 
distributions which are not even remotely -normal. Table 5.2, page G6, 
might possibly be regarded as a grouping of normally distributed frequency 
if rearranged as suggested iu 5.15 - it would be worth the investigator's 
while to proceed further and compare the actual distribution with a fitted 
normal distribution—but Table 5.4 could not be regarded as normal, and 
could not be rearranged so as to gi\** a grouping of normally distributed 
frequency. 

12.19. if the frequencies m a contingency table be not huge, and 
also if the contingency or correlation b< small, the influence of casual 
irregularities due 1 to fluctuations of sampling may render it diflieult to 
say whether the distribution may be regarded as essentially isotropic or 
not. In such eases sonu further condensation of the table by grouping 
together adjacent rows and columns, or some process ol “ smoothing 
by averaging the frequencies hi ad jacent compartments, may be of service. 
The correlation table for stature in father and son (Table 11.3), for 
instance, is obviously not strictly lsotiopu as it stands : we have seen, 
however, that it appears to be normal, within the limits of fluctuations 
of sampling, and it should consequently be isotropic within such limits. 
We can apply a rough test l>\ regrouping the table in a much coarser 
form, say with four rows and four columns : the table below exhibits such 
a grouping, the limits of rows and of columns having been so fixed as to 
include not less than 200 observations in each array. 


Tablc 124.- (Condensed fiom Table 11 3, p, UW.) 


1 

Son’s Stature 
(inches). 

Father’s Slat me (inches) 

Under 
65 5. 

65-5-07-5. 

67-5-69 5. 

69 5 

and ovoi. 

Total. 

Under 66-5 

07 5 

74 25 

34-75 

10 5 

217 

66 5-68 5 

76-5 

108 

85 

52 

321 5 

68 5-70 5 

33 25 

64 75 

95 

84 5 

277-5 

70 5 and over 

14-75 | 

32-5 

80 75 | 

134 

262 


— - 1 

-- 

- — — 

— - 

. ... . 

Total 

i 

222 j 

279-5 

295-5 

281 

1078 


Taking the ratio of the frequency in column 1 to the sum of the frequencies 
in columns I and 2 for each successive row, and so on for the other pairs of 
columns, we find the following series of ratios :— 



NORMAL CORRELATION. 


239 


Table 12.2. -Ratio of Frequency in Column m to Frequency in Column m 
4 Frequency in Column (m + 1 ) of Tabic 12 J. 


Row. 

1 and 2 

Columns 

2 and 3. 

3 and 4 

1 

0-568 

0 081 

0 768 1 

<■> 

0 415 

0 560 

0 620 1 


0 330 

0 405 

0 529 1 

4 

0 312 

0 287 

0-376 j 


These ratios decrease continuously as we pass from the top to the bottom 
of the table, and the distribution, as condensed, is therefore isotropic. 
The student should form one or two other condensations of the original 
table to 8- y 8- or 4- x 4-fold form: he will probably find them either isotropic 
or diverging so slightly from isotropy that an alteration of the frequencies, 
well within the margin of possible fluctuations oi sampling, wall render the 
distribution isotropic. 

Relationship between Contingency and Normal Correlation. 

12.20. It was shown by Karl Pearson that if a normal bivariate 
universe is divided into sections so as to form a contingency table, the 
eoelhcient of mean square contingency, C, tends to the value r in magnitude 
as the intervals become finer and finer, though of course it is always 
positive m sign. It was, m fact, tlie relation 



where (f > 2 is the mean square contingency, which led Pearson to identify 
C with the expression on the right. 

The values of C and r for the distributions of some of the tables ol 
Chapter 11 were compared in Exercise 11.8, page 225. 


SUMMARY. 

1. The equation of the normal surface is 

1 f -r'f Vriixtzi 

% _ c 2 ( 1 ~ r iii)e T i ^2 <V 

' 12 27Tcr 1 cr 2 Vl -rf 2 

where cq is the s.d. of a\, <r 2 that of x 2i and r 12 the correlation between 
and ;r 2 . 

This may also be written 

_ / 4 4 \ 

NVl-r 2 ffl *(T2 .i <ri \f 

Vi2~ \y~~~ ~ c 




240 

where 


THEORY OF STATISTICS. 


a?. 3 -o?(l -rl s ), 

2. For two variates normally correlated the standard deviations of 
parallel arrays are equal and the regressions are linear. 

3. Any section of the normal surface by a vertical plane is a normal 
curve, and a section by a horizontal plane is an ellipse. The ellipses given 
by horizontal sections are similar and similarly situated. 

4. The bivariate normal distribution is isotropic. 

5. A linear function of variates, each of which is normally distributed, 
is also normally distributed. 


EXERCISES. 

12.1. Deduce equation (12.12) from the equations for transformation of 
co-ordinates without assuming the normal distribution. (A proof will be found 
in ref. (248).) 

12.2. lienee show that if the pairs of observed values of ^ and are repre¬ 
sented by points on a plane, and a straight line diawn through the mean, the 
sum of the squares of the distances of the points from this line is a minimum 
if the line is the major principal axis. 

12.3. The eoellieient of correlation with ret ere rice to the principal axes being 
zero, and with reference to other axes something, there must lie some pair of axes 
at right angles for which the 1 correlation is a maximum, t.e. is numerically 
greatest without regard to sign, ^how that these axes make an angle of 45 J 
with the principal axes, and that the maximum value of the correlation is 

S 2 _ V 2 

, i _y j 

i S,MS ! 1 

12.4. (Sheppard, ref. (258).) A fourfold table is formed from a normal 
correlation table, taking the points of division between A and u, IS and (j , at the 
medians, so that (A) ~(u) — {B) ~ QS) - .Y/2. Show that 

' »(■ 2 r> 

12.5. Show that the points of indention of the seel ions of the normal surface 
by vertical planes through the mean of the distribution lie on an ellipse; and 
show how this ellipse may be used to give the standard deviations of such 
sections, 

12.6. Hence tind the minimum and maximum standard deviations which 
can be taken by such sections, and show that any specified value of the s.d. 
between the minimum and maximum will be given by two, and only two, 
sections. 

12.7. Find the conditions that the surface 

Z = kCP 7 '* 4 2/u:// 4 by* 

can represent a normal correlation surface whose \ ariates are and //. Assuming 
these conditions satisfied, express a lt cq and r ia m terms of a, h and b. 



CHAPTER 13. 


FURTHER THEORY OF CORRELATION. 

Methods of Estimating the Product-moment Correlation Coefficient. 

13.1. The only strict method of calculating the correlation coellicient 
is that described in Chapter 11, from the formula 

r _ S(.r>/) 

Where possible this formula should be employed. It sometimes happens, 
however, owing to incomplete data, that we art constrained to use some 
method of approximation. Furthermore, the large amount of arithmetical 
labour involved in applying the ordinary formula may sometimes be 
avoided b\ approximations which are sufficiently accurate for the purpose 
in \icw. We therefore proceed to give a few methods of this kind. They 
are not recommended for general use as (hoy will, as a rule, lead to different 
results in different hands. 

13.2. (1) The means of rows and columns arc plotted on a diagram, 
and lines lilted to the points by eye, say by shifting about a stretched black 
thread until it seems to run as near as may he to all the points. If b l9 b 2 be 
the slopes of these two lines to the vertical and the horizontal respectively, 

r V b)b 2 

lienee the value of r may he estimated from any such diagram as fig. 11.8 
or 11.9, in the absence of t lie original table. Further, if a correlation table 
be not grouped by equal intervals, it may be difficult to calculate the 
product sum, but it may still be possible to plot approximately a diagram 
of the two lines of regression, and so determine roughly the value of r. 
Similarly, if only the means of two rows and two columns, or of one row and 
one column in addition to the means of the two variables, are known, it will 
still be possible to estimate the slopes of 1\R and CC\ and hence the correla¬ 
tion coefficient. 

(2) The means of one set of arrays only, say the rows, are calculated, 
and also the two standard deviations < 7 , and cr v . The means are then 
plotted on a diagram, using the standard deviation of each variable as the 
unit of measurement, and a line fitted by eye. The slope of this line to the 
vertical is r. If the standard deviations be not used as the units of measure¬ 
ment in plotting, the slope of the line to the vertical is raja v9 and hence 
r wall be obtained by dividing the slope by the ratio of the standard 
deviations. 

This method, or some variation of it, is often useful as a makeshift when 
the data are too incomplete to permit of the proper calculation of the 

241 16 




242 


THEORY OF STATISTICS. 


correlation, only one line of regression and the ratio of the dispersions of 
the two variables being required: the ratio of the quartile deviations, or 
other simple measures of dispersion, will serve quite well for rough purposes 
in lieu of the ratio of standard deviations. As a special ease, we may note 
that if the two dispersions are approximately the same, the slope of RR to 
the vertical is r. 

Plotting the medians of arrays on a diagram with the quartile devia¬ 
tions as units, and measuring the slope of the line, was the method of 
determining the correlation coefficient (“ Gallon’s function ”) used by Sir 
Francis Galton, to whom the introduction of such a coefficient is due 
(refs. (242) and (243), cf. also ref. (215)). 

(3) If s x be the standard deviation of errors of estimate like x 
we have, from 11 . 24 , 

.v , 2 = oV 2 (l -/ 2 ) 

and hence, 



But if the dispersions of arrays do not differ largely, and the regression is 
nearly linear, the value of s Y may be estimated from the av crage of the 
standard deviations of a few rows, and r determined or rather estimated 
— accordingly. Thus in Table 11.3 the standard deviations of the ten 
columns headed 02-5 63*5, 63-5- 01*5, etc., are : 

2-50 2-26 

211 2-20 

2 55 2-15 

2-24 2-33 

2*23 

2*60 Mean 2-350 

The standard deviation of the stature of all sons is 2-75 : hence approxi¬ 
mately 



=- 0*514 


This is the same as the value found by the produet-sum method to the 
second decimal place. It would be better to take an average by counting 
the square of each standard deviation once for each obscrvalion in the 
column (or “ weighting ” it with the number of observations in the column), 
but in the present ease this would only lead to a very slightly different 
result, viz. 5=2-362, r ~~ 0-512. 

Non-linear Regression. 

13 . 3 . We referred in Chapter 11 to the fact that the treatment of 
eases when the regression is non-linear is somewhat difficult. Wc may, by 
the methods of C hapter 17, and otherwise, fit curves of any order to the 
means of arrays, just as we have fitted straight lines to them ; but the 
handling of these regression curves and their interpretation is far more 
complicated. 



FURTHER THEORY OF CORRELATION. 


248 

13.4. It is therefore desirable, wherever possible, to deal with variates 
which result in linear regression. Now it sometimes happens that if a 
relation between X and Y be suggested, we may, either by theory or by 
previous experience, throw that relation into the form 

Y ~A + B<f>(X) 

where A and B are the only unknown constants to be determined. If 
a correlation table be then drawn up between Y and <f>(X) instead of Y 
and X , the regression will be approximately linear. Thus in Table 11.5, 
page 201, if X be the rate of discount and Y the percentage of reserves 
on deposits, a diagram of the curves of regression suggests that the 
relation between X and Y is approximately of the form 

X(Y~B)~A 

A and B being constants ; that is, 

XY A t BX 

Or, if we make XY a new variable, say Z, 

Z - A + BX 

Hence, if wc draw up a new correlation tabic between X and Z the 
regression will probably be much more closely linear. 

If the relation between the variables be of the form 

Y — A B x 

we have 

log Y — log A + X log B 


and hence the relation between log Y and X is linrai. 
relation be of the form 

X n Y = A 


we have 


log Y =» log A ~n log X 


Similarly, if the 


and so the relation belween log V and log A r is linear. By means of 
such artifices for obtaining correlation tables in which the regression is 
linear, it may be possible to do a good deal in difficult cases whilst using 
elementary methods only. The advanced student should refer to refs. 
(273) and (377) for different methods of treatment. 


The Correlation Ratios. 

13.5. In view of the importance of linearity of regression it is 
desirable to have some criterion which wdll enable a judgment to be 
formed whether a regression is, within Ihc limits permitted by sampling 
fluctuations, linear in any given ease. Wc now proceed to discuss a 
coefficient designed for this purpose. 

Consider a bivariate frequency table, and let be the standard 
deviation of the pth array of JC’s. Let n v be the number of observations 
in this array. 



244 


THEORY OF STATISTICS. 


<£- v S (V*r) 


. (18.1) 


Then a^ a is the weighted mean of the variances of arrays, obtained as 
suggested in the last sentence of 13.2 (8). Now, let 

<£-"20 -<) .... ( 13 . 2 ) 


Then yj xv is called the correlation ratio of X on Y. Similarly, y ux , 
defined by 

^=i -°7 

O'./ 

is called the correlation ratio of F on A". 

13.6. Tlie correlation ratios may be put in another form, which is 
much more convenient for purposes of calculation. 

In fact, if M x is the mean of all the A’s and m vx tlie mean of an array, 
we have, as in equation (8.6), 

^ = S[« ; ,{^ + (M r - WV3 )2}] 

or, using o mx to denote the standard deviation of m vx , obtained by 
“ weighting” each m va according to n p9 the number of observations in 
the array in which it occurs, 

.... (13.4) 


Hence, substituting in (18.8), 


(13.5) 


The correlation ratio of A on Y is therefore determined when wo ha\ e 
found the standard deviation of A" and the standard deviation of the 
means of its arrays. 

13.7. In 11.22 we saw that 


o£(l ~ r2 ) = -jv S {x-b^Y 


. (13.6) 


where x~b x y- 0 is the line of regression of x on //, x and y being the 
values of X and Y measured from the mean of the distribution. 

Now, for any array for which y is constant, 

- b x y) 2 --- ^S{(* - m vx ) + (m vx -bfl)} 2 





FURTHER THEORY OF CORRELATION. 245 

the product term vanishing since S(,r - 0 . Hence, summing for all 

arrays of y, 

- r 2 ) = a 2 m + S^{ 

But 

Hence, 

• . . ( 18 . 7 ) 

From this we sec that 77 ^ cannot be less than r in absolute value. 

Tf 7 f 2 r1J - r 2 , then 

y) z ' 0 

/.e. 

for all arrays. This means that the mean m px must be on the line of 
regression for all arrays, i.r. that the regression is linear. 

13 . 8 . The di\ergonce of rj 2 from ? 2 therefore measures the departure 
oi the regression from linearity. It should, however, be noted that 
sampling fluctuations may cause rj 2 ~r 2 to deviate from zero even when 
the regression is truly lmeur. We give later a method of testing the 
signifieanee of observed fluctuations of this kind ( 23 . 44 ). 

Calculation of the Correlation Ratio. 

13.9. The table on page 240 illustrates the form of the arithmetic 
for the calculation of the correlation ratio of son’s stature oil father’s 
stature (Table 11.ft). In the first column is given the type of the array 
(stature of father) ; in the second, tin* mean stature of sons for that array ; 
in the third, the difference of the mean of the array from the mean stature 
of all sons. In the fourth column these differences are squared, and in 
the sixth they arc multiplied by the frequency of the array, two decimal 
places only having been retained as sufficient for the present purpose. 
The sum-total of the last column divided by the number of observations 
(1078) gives a‘f ny — 2-058, or o my ~ 1-43. As the standard deviation of 
the sons’ stature is 2*75 in., r] vx — 0*52. Before taking the differences for 
the third column of such a table, it is as well to cheek the means of the 
arrays by recalculating from them the mean of the whole distribution, 
i.e. multiplying each array-mean by its frequency, summing and dividing 
by the number of observations. The form of the arithmetic may be 
varied, if desired, by working from zero as origin, instead of taking differ¬ 
ences from the true mean. The square of the mean must then be sub¬ 
tracted from S(fm*)IN to give Oj ny > 

13.10. If the second correlation ratio for this table be worked out in 
the same way, the value will be found to be the same to the second place 
of decimals : the two correlation ratios for this table are, therefore, very 
nearly identical, and only slightly greater than the correlation coefficient 
(0*51). Both regressions, as follows from the last section, are very nearly 
linear, a result confirmed by the diagram of the regression lines (fig. 11 . 8 , 
page 211). On the other hand, it is evident from fig. 11.10, page 21ft, 



246 


THEORY OF STATISTICS# 


Example 18.1.—Calculation of the Correlation Ratio: Son's Stature on 
Father's Stature : Data of Table 11.3, p. 199. 


1. 

Type of 
Array 
(Father’s 
Stature). 

2. 

Mean of 
Array 
(Son’s 
Stature). 

3. 

Difference 
from Mean 
of all Sons 
(68-66). 

4. 

Square of 
DilFerence. 

5. 

Frequency. 

6. 

Fiequency x 
(difference) 2 . 

59 

64-67 

-8*99 

15*9201 

8 

47-76 

60 

65*64 

-3*02 

9*1204 

3*5 

81'92 

61 

66*34 

- 2*32 

5*3824 

8 

43-06 

62 

65*56 

-3*10 

9*6100 

17 

163*37 

68 

66*68 

- 1*98 

3*9204 

33*5 

131*38 

64 

66*74 

-1*92 

3*6804 

61*5 

226*71 

66 

67*19 

-1*47 

2*1609 

96*5 

206*37 

66 

67*61 | 

— 1 *05 

1*1025 

142 

156*56 

67 

67*95 

-0*71 

0*5041 

137*5 

69*31 

68 | 

69*07 

+ 0*41 

0*1081 

154 

25*89 

69 

69 39 

+ 0*73 

0*5329 

141*5 

75-41 

70 

69*74 

+ 1*08 

1*1664 

116 

135*30 

71 

70*50 

+ 1*84 

3*3856 

78 

264*08 

72 

70 87 

+ 2-21 

4*8841 

49 

239-3*2 

73 

72*00 

+ 3*34 

11*1566 

28*5 

317 93 

74 

71*50 

+ 2 84 

8*0656 

4 

32*26 

75 

71*73 

+ 3*07 

9*4249 

! 5*5 

51*84 

Total 

... 

... 

... 

1078 

2218-42 


a, nu =2218-42/1078 ^2058 a mv - 1 Ki 
rjvr — 1 •4i)/2 , 7n — 0 52 


that vvc should expect the two correlation ratios lor Table 11.0 to differ 
considerably from each other and from the correlation eoeflicient. The 
values found are rj xv 0*14, 77 ^-0*08 (r - 0 014) : t) xv is comparatively 
low as proportions of male births differ little in the successive arrays, 
but 7] yx is higher since the line of regression of V 7 on X is sharply curved. 
The confirmation of these values is left to the student. 

The student should notice that the correlation ratio only affords a 
satisfactory test when i lie number of observations is sufficiently large for 
a grouped correlation table to be formed. In the case of a short series of 
observations such as that given in Table 11.7, page 20 :*, the method is 
inapplicable. 

The Rank Correlation Coefficient. 

13 . 11 . Jn calculating the coefficient of correlation from the product- 
moment it is necessary that the data should be definitely measured. If 
they are not so measured we cannot, in general, determine the coefficient, 
though we may sometimes approximate to it by one of the methods of 

13 . 2 . 

But there may be more serious obstacles than imperfect grouping in 
the way of finding the correlation between two variates. In the examples 



FURTHER THEORY OF CORRELATION. 


247 


we have considered up to the present the qualities we have discussed have 
been easily measurable, involving such familiar concepts as height, weight, 
age and so forth. In certain types of inquiry we may have to deal with 
qualities which are not expressible as numbers of units of an objective 
kind. 

13.12. Consider, for instance, the relation between mathematical 
and musical ability in a class of students. “ Ability,” whether of a general 
or a specific kind, is a variate in the sense that it varies from one individual 
to another ; and it may be a numerical variate if we can decide on some 
unequivocal way of measuring it. A very common mode of attempting 
to do so is by allotting marks to each student. But such methods are open 
to many objections, not the least of which is that different examiners would 
give different marks to the same person. A correlation between the marks 
obtained for mathematics and music would, therefore, be* likely to depend 
to some extent on the examiner, and would not reflect accurately the 
relationship between the two qualities. 

13.13. Difficulties of this typo disappear to some extent if we arrange 
the students in order of their ability, but do not attempt to assess it 
numerically. Then* will still be some divergence of opinion between 
different examiners, perhaps, but it will not as a rule be so serious. We 
then allot to each student a number which indicates his position in the 
arrangement according to ability, the first being number I, the second 
number 2, and so on. The students are then said to be ranked, and the 
number of a particular individual is his rank {cf. 8.32). 

13.14. A procedure of this kind is useful in the treatment not only 
of data which can be ordered but not exactly measured, but of measurable 
data also. For instance, we can easily rank a number of men according 
to height without actually measuring them. It is also comparatively easy 
to rank a number of shades of a colour, or a number of countries according 
to their importance in the export market, where precise numerical measure¬ 
ment would be very troublesome. 

13.15. If we have a set of individuals ranked according to two 
different qualities it is natural to inquire whether the ranks can be made 
to give us some measure of the degree of relation between the two qualities. 

Suppose we have n individuals, whose ranks according to quality A are 
. . . X v , and according to quality Vi are F 1? F,, F 3 , . . . F n , 
where the A'\s and Y 9 s are merely permutations of the first n natural 
numbers. Let d k --X k - Y k . 

The values of d form a convenient measure of the closeness of the 
correspondence between A and B. If all the d 9 s are zero the correspond¬ 
ence is perfect, for an individual whose rank is X k for A will also bo X k for B. 
We cannot, however, take the sum of the d 9 s as a measure of correspondence, 
because that sum is zero ; for the sum of the differences of the A’’s and F’s 
is the difference of the sums of the X's and the F\s, each of which is the sum 
of the lirst n natural numbers. 

A possible measure which suggests itself is the sum of the absolute values 

of the cfs, i.e. S| d |. This measure and its mean ^S| d | have, in fact, been 

used, but like the mean deviation (8.17) they have certain analytical 
disadvantages. 

13.16. A more convenient coefficient is obtained as follows :— 



248 THEORY OF STATISTICS. 

The values of A' range from 1 to n. Their sum is \ and their 

mean is accordingly n This value ]s also the mean of the l r, s. 

A 


Let us denote by x, the value of X A ~ n ^ , i.e. the divergence of X k 

n - f l a 
2 * 


from the mean. Similarly for ?/,, which we deline as Y, 
Write 

P 


A S(cr 2 )S(// 2 ) 


. ( 18 . 8 ) 


This is the prodmi-inomenl coelFieient of correlation between X and l 7 . 

We shall call p the rank correlation coefficient. It may be expressed 
very simply m terms ol n and the d's. 

1 — /? 

For, as we saw in 8.14, S(7-) is * 

Now% 

S(d 2 )=S(A' y -F,) 2 -S(.r -//) 2 
— s(i? -) +• S(// 2 ) -ss(a//) 

llcnco, 

s^/y)S(rf 2 )j 

and substituting in (13.8): 

.... (13.9) 

r - n ' 

Example 13 2 .— The rankings ot ten students m mathematics and 
music arc as follows : — 

Mathematics : 1, 2, 3, 4, 5, 0, 7, 8, 9, 10 
Music : (5, 5, 1, t, 2, 7, 8, 10, 3, 9 

What is the coefficient of rank correlation ? 


The differences d are (mathematical rank minus musical rank) 

5, -3, 4 2, 0, +• 3, 1, -1, -2, +6, 4 1 

These add to zero, as they should. 

The squares of d are 


25, 0, 4, 0, 9, 1, 1, 4, 30, 1 


which add up to 90. 
lienee, from (13.9), 


, 540 

P ' 1 " 900 ~ 


+•0*15 


13.17. The rank correlat ion coefficient varies from +1 to -1. If the 
rank correlation is perfect, all the c/’s arc zero. If, on the other hand, the 



FURTHER THEORY OF CORRELATION. 


249 


ranks arc such that the first, second, third in one order correspond to the 
nth, (n - l)th, (n -2)th, ... in the other, p = -1. The proof is slightly 
different according to whether n is even or odd. If it is odd. say ~2 m + 1, 
the d\ arc 

2m, 2m - 2, ... 2, 0, -2, ... -(2m -2), -2m 

and 

S(d 2 ) 2{(2m) 2 + (2m-2)*+ . . . -t 2 2 } 

8n?(n? + l)(2m + 1) 
fi 

Hence, 

8m(m 4 l)(2m f 1) __ 

p (2 m +l){(2w -f 1 ) 2 ~1 J 


If // is even, say 2m, 



Relationship between Rank Correlation and Product-moment 
Correlation. 

13.18. The rank correlation coefficient as we have introduced it is 
merely a measure, like the coefficients of association, contingency and 
product-moment correlation, of the correspondence between two quantities. 
Like those coefficients, it is affected by sampling fluctuations. 

It is, however, more easily calculated than most coefficients, and for 
tins reason some writers have advocated its use as a substitute for the 
product-moment coefficient between the actual measurements, and for 
estimating the product-moment coefficient from a normal universe. We 
proceed to examine this practice briefly. 


Grade Correlation. 

13.19. We referred at the end of Chapter 8 to such quantities as 
quart lies, deciles and percentiles, which are values of the variate dividing 
the total frequency into certain specified proportions. For instance, the 
seventh decile is the variate value such that seven-tenths of the distribution 
lie below it, i.c, exhibit values of the variate less than the decile. 

Generally, we may regard the grade of an individual as the proportion 
of individuals which he below him (cf. 8.30). If the universe is continuous, 
the range of grades will also be continuous. 

13.20. To each individual in a bivariate universe there will be 
attached two grade numbers, one for each variate, and if the universe is 

1 The property of varying between +1 and -1 does not belong to a similar coefficient 
proposed by Spearman, and known as his “foot-rule.’' viz. R ~1 

n A -1 

It may be shown in the above manner that R varies from -0*5 to ! 1, and for this 
reason alone R seems an undesirable coefficient. 



250 


THEORY OF STATISTICS. 


correlated the grades will also be correlated. In fact, Karl Pearson has 
shown that if the universe is normal, p ot the grade correlation, and r, the 
ordinary correlation (both calculated by the product-moment method), are 
related by the equation 

r= 2 sin .... (18.10) 

13.21. Ranks and grades are connected by a simple relation. In 
fact, if an individual is of rank k, there are k- 1 individuals below him 
(assuming that the ranking proceeds from the lowest variate value). If 
we admit, conventionally, that one-half of the individual is to be regarded 
as lying to the left of the line of division which he makes, and one-half to 
the right, his grade, g k , is given by 


&«(*-!)+ • • • ( 13 . 11 ) 

It follows that the correlation between ranks is the same as the correla¬ 
tion between grades. But in a universe which is finite and discontinuous 
(and ranking is in practice applied to conrparath ely small universes of 
twenty or thirty individuals) it does not follow that 

r-=2 sin ( 7T f .... (13.12) 

Equation (13.10) was obtained by considering grades in a continuous 
universe, and equation (18.12) is at best an approximation, depending on 
assumptions which are often of doubtful legitimacy. This is a fact which 
has not always been appreciated. We may, perhaps, clarify the point by 
considering the data of Example 18.2. 

Example 13.3 .—In Example 13.2 we found : 

p~ +0*45 

If we apply (18.12) we find : 

r 2 sin i 8*5° 

- +0-47 

Let us consider what this means. 

The value r purports to be a correlation coefficient such as would have 
been obtained by the product-moment method if the two variates had been 
measurable in the ordinary way. Let us, for the sake of argument, agree 
that mathematical and musical abilities are capable of measurement. 

Now there are only ten members in this universe, and it cannot be 
regarded with any degree of accuracy as a continuous normal universe. 
The use of (18.12) in finding the correlation in the universe of ten is there¬ 
fore of doubtful validity, to say the least. 

But it is possible to look at this from rather a different point of view, 
and to regard the ten students as a sample from a practically infinite 
universe which is continuous and normal. The value r is then taken to be 
an estimate of the correlation coefficient in this universe. 

The legitimacy of this procedure will depend on the extent to whicli the 



FtJETHEB THE OB Y OF CORBET, ATION, 


251 


grade correlation in the sample can be taken to represent the grade correla¬ 
tion in the universe. It will, we think, be sufficiently evident from the 
smallness of the sample that the two are likely to diverge considerably 
owing to sampling fluctuations. 

Furthermore, in the comparatively small samples to which (13.12) is 
applied—the labour of calculating the rank correlation coefficient for large 
samples is very tedious—it is difficult to obtain any satisfactory evidence 
from the data themselves that the universe can properly be regarded as 
normal; and even if the distribution of each of the variates, taken singly, 
can be rendered normal by some appropriate transformation of the 
variate which squeezes or stretches the scale of measurement, it docs not 
necessarily follow that the correlation distribution can in this way be 
rendered normal. 

In practice, moreover, troublesome difficulties sometimes arise owing to 
t w o or more individuals being given the same rank. The common procedure 
of assigning to each individual the average rank of the group, but never¬ 
theless using formula (13.9), is inexact. 

Use of (13.12) should therefore be made with the utmost reserve. It 
would probably be better to avoid it altogether and rely on the rank 
correlation coefficient. 

13.22. The relationship between the product-moment coefficient and 
the rank correlation coellicient might profitably be subjected to further in¬ 
vestigation, particularly for small numbers of individuals. As we have just 
seen, with the present state of our knowledge, the use of the rank coefficient 
is not to be recommended as a brief method of estimating the product- 
moment coefficient. It appears, however, to be of service as a quick 
method of gauging relations between variates which are not normally 
distributed, or between quantities which cannot readily be measured, 
when the number of observations is small. 

Tetrachoric r. 

13.23. To complete our account of methods which ha\ e been devised 
as alternate es to the use of the product-moment correlation coefficient in 
eases where, for some reason, that coefficient cannot be computed, we may 
refer to a process specially adapted to the 2 x2 contingency table. 

Consider such a table in the schematic form : 


j A 


Total 

B . .1 

a 

h 

a + b 

Not-7i . 

c 

d 

c i d 

Total J 

a VC 

b Hi 

N 


Let us assume that our attributes A and B are, in theory, based oh 
measurable quantities ,* and let us suppose further that the universe would 
be normally distributed with respect to those quantities as variates. Then 
we may regard the above table as the result obtained by dividing a bivariate 
normal universe into four sections, a division of the A r -variate at some 
point, say h, and a division of the F-variate at some point k. If we 
picture the universe as a solid figure, as in fig. 11.1, page 204, the frequencies 



252 THEORY OF STATISTICS. 

a , b , e and d will be the volumes into which the universe is divided by 
planes perpendicular to the X and Y axes through the points X-h 
and F ~ Zr, respectively. 

The problem then arises, given a, b , c and d , what are the values of 
h and /c (in terms of the standard deviations of X and F), and what 
is the value of r ? 

13.24. A discussion of this problem, which involves some difficult 
mathematics, is outside the scope of this book. The student may be 
referred to “ Tables for Statisticians and Biometricians , Parts 1 and Ilf' for 
a short account of the method of solution and for tables which are almost 
indispensable in working out r for any given case. 

A value of r obtained in this way is said to be tetrachoric. 

The coefficient has often been used to obtain a value of the correlation 
(so-called) for a contingency table, using some reduction to the four-fold 
form by amalgamating adjacent arrays, or possibly making more than one 
such reduction and averaging the results. As such tables are very often 
far from normal, it is always desirable to test the normality by using more 
than one reduction. In any case the reader should be informed precisely 
as to the reduction used. 

The Product-moment Correlation Coefficient for a 2 > 2 Table. 

13.25. The correlation coefficient is in general only calculated for 
a table with a considerable number of rows and columns, such as those 
given in Chapter 11. In some <»ases, however, a theoretical value is 
obtainable for the coefficient, which holds good e\en for the limiting cast' 
when there are only two values possible for each variable (c.g. 0 and 1) 
and consequently two rows and two columns (cf. Exercises 13.5 and 
13.6). It is therefore of some interest to obtain an expression for the 
coefficient m this ease m terms of the class-frequencies. 

Using the notation of Chapters 1-4 the table may be written m the 
form: 


Values of 

Values of First Vannble 1 

Second 

- — — 

-- 

— 

Variable. 

Xi 

X't 

Tolul 


(AB) 

(aB) 

<*) 


(Aft) 

(<*ft) 

w 

Total 

(A) 

(«) 

N 


Taking the centre of the table as arbitrary origin and the class-interval, 
as usual, as the unit, the co-ordinates of the mean are; 

The standard deviations <r„ a 2 are given hy 

ctj 2 =0-25 - = (A)(a)/N 2 

cr 2 8 — 0-25 -fj 2 ~ 



J 

FURTHER THEORY OF CORRELATION. 253 


Finally, 

S (xy) -1 {(AB) + («0) - (.4/3) - (aB)} - N&j 

Writing 

(AB) - (A)(D)/N - 8 

(as m Chapter 8) and replacing £. rj by their values, this reduces to 

S(xy)=& 


Whence 


m 


V(A)(a)(B)(P) 


(13.13) 


This value of r can be used as a eoetiicient of association, but, unlike 
the association coellicient of Chapter 3, which is unity it either (AB) -= (A) 
or ( AB )= (73), r only becomes unity if (AB) = (A) - (B). This is the 
only case in which both frequencies (aB) and (Aff) can vanish so that 
(AB) and (a/3) correspond to the frequencies of two points, X x Y x , Y z 
on a line. Obviously this alone renders the numerical value's of the two 
coefficients quite incomparable with each other. But further, while the 
association coefficient is the same for all tables derived from one another 
by multiplying rows or columns by arbitrary eoetlicients, the correlation 
coefficient (13.13) is greatest when (ul) - (aj and (B)-^(fi), i.e. when the 
table is symmetrical, and its mi! uc is lowered when the symmetrical 
table is rendered asymmetrical by increasing or reducing the number of 
A’s or ZTs. For moderate degrees ol association, the association coefficient 
gives much the larger values. The two coefficients possess, in fact, 
essentially different properties, and are different measures of association 
in the same sense that the geometric and arithmetic means are different 
forms of average, or the semi-interquartile range and the standard devia¬ 
tion different measures of dispersion. 

13.26. The student should realise that the product-sum correlation 
and the tetraehorie correlation are also two entirely different measures 
with quite different properties. The one is in no sense an approximation 
to the other, and the two may often differ largely. 


Intraclass Correlation. 

13.27. We have previously considered correlations between two 
distinct types of variate*, such as age and yield of milk in cows, or stature 
of father and stature of son ; but there occurs, mainly in biological studies, 
a rather different kind of correlation which we will now proceed to discuss. 

Suppose w r e are examining the relationship between the heights of 
brothers, and consider a pair of brothers. Our two variates will be (1) 
the height of the lirst brother, and (2) the height of the second brother. 
The question is, which are we to regard as the first brother and which as 
the second ? It is not diflieult to lay down rules which would enable us 
to make a distinction— for instance, we might take the elder brother 
first, or the taller brother first. But if we did this and drew up a correla¬ 
tion table for all such pairs, we should not be answering the question 
as to the relation between brothers in general, for we should only get a 
correlation between the height of taller brothers and that of shorter 
brothers, or the height of elder brothers and the height of younger brothers. 

13.28. The relationship of brotherhood is in fact symmetrical ; if 



THEORY OF STATISTICS* 


254 

A is the brother of B, then B is the brother of A . When we are con¬ 
sidering only the relationship in height implied by relationsliip of blood, 
there is no relevant character to enable us to single out one brother as 
the first. 

We accordingly treat the problem by taking each pair of brothers in 
two ways: (1) with the height of A as the first variate and that of B as 
the second, and (2) with the height of B as the first variate and that of 
A as the second. Similarly, if there are k brothers in the family, we enter 
in the correlation table the results of taking pairs in all possible ways, 
which number k(k - ]). For example, if we have a family containing 
three brothers with heights 5 ft. 0 in., 5 ft. 10 in. and 5 ft. 11 in., they 
may be regarded as giving six pairs of variate values : 

5 ft. 0 in. with 5 ft. 10 in. 5 ft. 10 in. with 5 ft. 1) in. 

5 ft. 9 in. with 5 ft. 11 in. 5 ft. 11 in. with 5 ft. 9 in. 

5 ft. 10 in. with 5 ft. 11 in. 5 ft. 11 in. with 5 ft. 10 in. 

13.29. Generally, if we have n families, each with k members, then 4 
will be nk(k *-1) pairs, and hence the same number of entries in the table. 

Such a table is called an intraclass correlation table, and the 
correlation between the two variates is called intraclass correlation. 

Tables in which all the families have the same number are of particular 
importance, and we will consider them first. It is, however, permissible 
to apply the term intraclass correlation to the symmetrical table derived 
from families which have different numbers of members. This ease we 
shall consider in 13.33. 

13.30. The intraclass correlation table has certain peculiarities, and 
is not of such a general type as the ordinary table which we have con¬ 
sidered hitherto (and which, for the purposes of distinction, is sometimes 
called an interclass table). 

Let the variate values in the first family be 

^11 ^12 * * * X l7: 

those in the second family being 

21 ^22 • * • *^ 2 fc 

and so on, those in the nth family being 

®nl A'ii 2 • • • R'nl 

Consider the mean of the X- variate. 

In the table the value or u will be associated as an X-variate with 
each of the (Ar — 1) values cr l2 . . . x lk . Hence it appears (k - 1) times. 
Similarly, every other value appears (k -1) times. Hence the sum of 
the marginal row, corresponding to the X-variate, is (k ~ 1)S(#), the 
summation extending over all values. But there are nk(k-l) members 
in the table. 

Hence, 

*-*&- ij (A ~ 1)S ^ 

• • * 


. (18.14) 



FURTHER THEORY OF CORRELATION. 255 

Similarly, 

P-S.SM .... (18.15) 

i.e. the means of the variates are the same. This must evidently be the 
case, for the table is symmetrical. 

For the variance of X we have: 

and since each x -X occurs (/c - 1) times, 

oi^±- { S(x-X)*. . . . (13.16) 


the summation, as before, extending over all the values of x. 
Similarly, 

p )’ 

-,>*-*>* 


We therefore write 


O’ =* &X - (Jy 


13.31. For the correlation coefficient r we have 

aV %*(*“- --*)(*.«.-■*) • • ( 13 - 17 ) 

where the summation S' extends over all the possible pairs. 

We can put this formula into a much simpler form. 

Consider the terms in (13.17) for which the first term is (/r n - X). They 
will be the (k - 1) terms of the following series:— 

{•hi ~ * )K 2 - ^) + (.r n - X )(<r, 3 - X) 4 . . . -t (.r u - X )(.r u - X ) 

= (,7)n -A')[((ri a +*i 3 H . . . +;r lfc ) -(k -1)1} 

Now write 

-^1 ~k (*^11 *^12 * * • ( 13 . 18 ) 

i.e. jfj is the mean of the members of the first family. Then our expression 
becomes 

{T u -X){kS 1 -X ll -(k-l)2} 

-<*„ -*){*(*! u) 

-■*>-(< r n -X)> 

The sum S' of (13.17) will contain nk such terms. 




256 


THEORY OF STATISTICS* 


Hence, 

rtk(k-l)a*r~kS(X t ~ X)(x u ~ X) ~S(x n -l) 2 . (13.19) 

the summation extending over all the nk members. 

Now, 

kS(X x - X)(v u - X) 

-sum of n ten ns like k x A;(Aj - X )(X 1 - X) 

- /r 2 S' / (A 1 - X ) 2 

S" extending over the n families; and 

S( t r u - X) li nko 2 

Hence, from (18.19), 

«*(& -1 )a 2 r - A) 2 - ohik 

Now * S"(A, -X) 2 is the variance of the means of families about the 
n 

mean of the whole. Calling this cr,,, 2 , we have 

nk(h -1 )o 2 r - kh\a m 2 — a 2 nk 

(1 4 r(k - 1)1<7 2 kaj . . . (18.20) 

This result gives us the mtraelass eoirelation m terms ol (lie vanance ot 
the distribution (according to either variate) and the variance ol' the means 
of families. 

Example 13.4. — In five families of 3 the heights ot brothers are; 5' 9", 
5' 10", 5' 11" ; 5' 10", 5' 11", O' 0" ; 5' 11", O' 0", O' 1" ; 0' 0", O' 1", 0' 2"; 
0' 1", 6' 2", 0' 8". Find the intraclass coefficient of correlation. 

Here the mean of the whole 

<r 2 = „ {9 + 1 + 1 +4 4 1 H (1+1+44 144 + 9} 

5x3 

4<0 8 

15 3 

oj fa + l+O-U (1! 2 

5 

Hence, from (18.20), 

{1 1 2/|* 3 a 2 

1 +2/ 2*25 

r l 0*025 

13.32. We may notice two rather unusual results which follow from 
equation (13.20). 

In the first place, since <j m 2 is not negative, 

1 +r(k -1) ^ 0 



FUETHEE THEORY OF CORRELATION. 


257 > 


and hence, 

l 


Thus, whereas the interclass correlation coefficient can vary from -1 to 

+1, the intraclass coefficient cannot be less than - . V . For example, in 

families of threes the intraelass coefficient cannot be less than - J. 

Secondly, let us consider the correlation within a single family, i.e . when 
n --1. 

In this ease, a m l = 0, and hence 

1 


For k- 2, 8, 4, . . . this gives the successive values of ?• = -1, 
~ ], - J t , . , It is clear that the first value is correct, for the two values x L 
and determine only two points and an d the slope of the line 

joining them is negative. 

The student should notice that a corresponding negative association 
will arise between the lirbt and second members of the pair if all possible 
pairs are chosen from a universe in which the variates can assume only two 
values, say 0 and I, or in which only A\ and nol-vi’s are distinguished. 
We use this result later in 19.36. 

13.33. Reverting now to t ho more general case, suppose we have n 
families whose members number Aj, k 2 . . . . k n . 

The ith family contributes k l (k l -1) pairs to the intraclass table, and 
hence the total number of pairs is S{A*(A, -1)} =- N, say, the summation 
exlending over the n families. 

Let the variate values be 

*11 <*12 • • ■ 

lT 2 l *22 * * ‘ 

*’«l lV n2 • * * *’«*» 

As in 13.30, we see that in the intraelass table each member of the first 
family appears (k x - l) times, each of the second (k\ 2 -1) times, and so on. 
lienee, 

x S'(,r„)} . . . (13.21) 

the summation S' being carried over all members of the ith family and S 
over all families. 

Similarly, 

. . ( 18 . 22 ) 

and 

the summation extending over all possible pairs. 


IT 



258 THKORY OF STATISTICS, 

and this, as in 13.31, reduces to 


Na z r= S{k i 2 (X l - A") 2 } - SS. . (13.23) 

These formula 1 are considerably more complex than (13.14), (13.16) and 
(13.20), but reduce to those forms if k t is constant for all families. 


SUMMARY. 

1. In cases whore the data are incomplete, or in order to avoid lengthy 
calculation, it is possible to use various methods of approximating to the 
product-moment coefficient of correlation, provided that the regression is 
approximately linear. 

2. Cases in which the regression is non-hnear can sometimes be reduced 
to the linear case by a suitable transformation of the variates. 

3. The correlation ratio of X on Y is given by 




*> 


where cr x is the variance of X, ol x is the weighted average of the 
variances of arrays and <r* u the variance of the mean'* of Y-arrays, 
weighted according to tlie number of mdi\ iduals in the arrays. 

4. r) 2 xl/ - r 2 cannot be negative, and if it is zero the regression of X on V 
is linear. 

5. The rank correlation eoellieirnt is given by 


__ 

/>_ VS(.r*)S (y*) 

where & and y arc the deviations of the ranks X and Y from the mean 
n +1 

2 6. If d,-(X,-Y h ) 

OS (d*) 

^ n 3 - n 


7. The coefficient of intraclass correlation is given by 

{1 +r(k - 1 )}a 2 —ka m 2 

where a is the standard deviation of X and F, and cr m is the standard 
deviation of the means of families, there being n families each of 
k members. 



FURTHER THEORY OF CORRELATION. 


259 


EXERCISES. 

13 . 1 . Find to 3 places of decimals the correlation ratio of A" on Y and of Y r on X 
for the distribution of cows of Table 11.4, page 200 (r - ^ 0-219). Hence, show 
that 

1/%-r 1 0 011 

' 4-' 2 <>020 

1,*{.2. Find the correlation ratios of the distribution of marriages of Table 11.2. 

13.3. In a test of ability to distinguish shades ol colour, 15 discs of various 
shades, whose true orders are 1,2,. . .15, are arranged by a subject in the 
order 7, 4, 2, 3, 1, 10, 6, S, 9, 5, 11, 15, 1 i, 12, 13. Find the rank correlation 
coeilieient between the real and the obser\cd ranks. 

13.1. Ten competitors in a beauty contest arc ranked by three judges in the 
orders 

1, 0, 5, 10, 3, 2, 1, 9, 7, 8 
3, 5, 8, 1, 7, 10, 2. 1. 0, 9 
0, 1, 9, 8, 1, 2, 3, 10, 5, 7 

l se the lank eon elation coefficient to discuss which pan of judges have the 
nearest approach to common tastes m beauty . 

13.5. (f/. Pearson, “On a Generalised Tlieoiy of Alternative Inheritance,” 
Phif. Turns., vol. 203, A, 1901, p. 53.) 11 we consider the correlation between 

number of recessive couplets in parent and in offspring, in a Mendelian population 
bleeding at random (such as would ulliinateh result from an initial cross between 
a pure dominant and a pure rceissivc), the correlation is found to be 1/3 for a 
total number of couplets n. If u 1, the only possible numbers of recessive 
couplets arc 0 and 1, and the correlation table between parent and oilspring 
reduces to the lorm 


Oflspi mg 

0 

I*oi t nt 

1 

i 

Total 

0 

5 

1 

6 

1 

i 

1 

2 

Total 

6 ! 

! 2 

8 


Verify the correlation, and work out the association coeilieient Q. 

13.0. ((/, the above, and also Snow, Proc. Hoy. Sol., vol. 83, B, 1910, Tabic 3, 
p. 42.) For a similar population the correlation between brothers, assuming a 
practically infinite si/.e of family, is 5/12. The table is 


Second 

brother 

First Brot 

0 1 1 

ler. 

Total. 

0 

41 

a 

48 

1 

n 

3 

16 

Total 

48 

16 

64 


Verify the correlation, and work out the association coefficient Q. 










THEORY OF STATISTICS 


260 

18.7. Referring to the notation of 13.25, show that we have the following 
expressions for the regressions in a fourfold table:— 

o t __ m JAB) (Ap) 

r o,-(B)((i)~~(B) (ft) 

< t 5 _ Nd _(AB) _(aB) 
r <T t -(A)(a)~lA) Ux) 

Verify on the tables of Exercises 18.5 and 18.6. 

13.8. In four pea-pods, each containing eight peas, the weights of the peas 
are, in hundredths of a gramme: 43, 40, 48, 42, 50, 45, 45 and 49; 83, 34, 37, 39, 
32, 35, 37 and 41 ; 56, 52, 50, 51, 54, 52, 49 and 52; 36, 37, 38, 40, 40, 41, 44 
and 44. Find the coellicient of intraclass correlation. 

13.9. (Data from O. H. Latter, Biometnka , vol. 4, 1905, p. 363.) 

The following table shows the length of cuckoos’ eggs fostered by various 
birds:— 

Length of Egg (units \ millimetre). 


Foster Parent 

. 40 

41 

12 

43 

41 

15 

46 

47 


49 

50 

Totals 

Robin 

1 i 

1 1 

8 


u 

in 

20 

6 

11 

o 1 

~ I 

*> 

76 

Wren 

1 r> 

* 1 * 

1— - 

; 

it 

8 

i 

9 


3 



1 


54 

Hedge-iSpari ovv . 


1 

2 

Cii 

1 1 1 

•_ 

■ » i 

18 

! 8 

5 


8 ! 

58 

Totals 

1 “ 
. i 8 

1 

1 "1 

24 

16 i 

! 1 

82 

i , i 

32 1 

86 

( 11 

i"‘i 

2 

5 

188 


Find the coellicient of intraclass correlation, and state how many entries 
there would be in the mtruolass correlation table. 




CHAPTER 14. 


PARTIAL CORRELATION. 

Multiple Correlation. 

14.1. In Chapters 11 to 13 we developed the theory of the correlation 
between a single pair of variables. But in the ease of statistics of 
attributes we found it necessary to proceed from the theory of simple 
association for a single pair of attributes to the theory of association for 
several attributes, in ord< r to be able to deal with the complex causation 
characteristic of statistics ; and similarly the student will find it impossible 
to advance very far in the discussion of many problems in correlation 
without some knowledge of the theory of multiple correlation , or correlation 
between several variables. 

For example, m considering the relationship between pauperism, out- 
reliof and the age of recipients of relief, it might be found that changes 
in pauperism weie highly correlated (positively) with changes in the out- 
rehef ratio, and also with changes in the proportion of the old ; and the 
question might arise how tar the first correlation w'as due merely to a 
tendency to give out-relief more freely to the old than the young, i.e. to a 
correlation between changes in out-relief and changes in proportion of the 
old. The question could not at the present stage be answered by working 
out the correlation coefficient between the last pair of variables, for we 
have as yet no guide as to how far a correlation between the variables 

1 and 2 can be accounted for by correlations between 1 and 3 and 

2 and 3. 

Again, a marked positive correlation might be observed between, say, 
the bulk of a crop and the rainfall during a certain period, and practi¬ 
cally no correlation between the crop and the accumulated temperature 
during the same period; and the qu<stion might arise whether the last 
result might not be due merely to a negative correlation between rain and 
accumulated temperature, the crop being favourably affected by an 
increase of accumulated temperature if other things were equal , but failing 
as a rule to obtain this benefit owing to the concomitant deficiency of ram. 
In the problem of inheritance in a population, the corresponding problem 
is of great importance, as already indicated in Chapter 4. It is essential 
for the discussion of possible hypotheses to know whether an observed 
correlation between, say, grandson and grandparent can or cannot be 
accounted for solely by observed correlations between grandson and parent, 
parent and grandparent. 

Partial Regressions and Correlation Coefficients. 

14.2. Problems of this type, in which il is necessary to consider 
simultaneously the relations between at least three variables, and possibly 

261 



262 


THEORY OF STATISTICS. 


more, may be treated by a simple and natural extension of the method 
used in the ease of two variables. The latter ease was discussed by form¬ 
ing linear equations between the two variables, assigning such values 
to the constants as to make the sum of the squares of the errors of esti¬ 
mate as low as possible : the more complicated case may be discussed by 
forming linear equations between any one of the n variables involved, 
taking each in turn, and the n -1 others, again assigning such values to 
the constants as to make the sum of the squares of the errors of estimate 
a minimum. If the variables ate X t , A 2 , A r 3 , . . . X )r the equation will 
be of the form 

^ 1 a t ^2^ 2 + ^3^ 3 + • • • t k n X ri 

If in such a generalised regression equation we lmd a sensible positive 
value for anv one eoeflieient such as /; 2 , we know tliat then 1 must l)e a 
positive correlation between X l and X 2 (hat cannot be accounted for by 
mere correlations of X\ and X 2 with A' v A' 4 or X )n for the effects of 
changes in these \ ariahles are allowed for m tli<' remaining terms on the 
right. The magnitude of b 2 gives, in fact, the mean change m A\ 
associated with a unit change in A 2 when all the remaining variables are 
kept constant. 

The correlation between X t and A r 2 indicated by b 2 may be termed 
a partial correlation, as corresponding vm th the partial association of 
Chapter \ 9 and it is required to deduce from the values of the coefficients 
b , which may be termed partial regressions, partial coefficients of 
correlation giving the correlation between X x and X 2 or other pair of 
variables when the remaining variables A 3 . . . X n are kept constant , or 
when changes iri these variables are corrected or allowed for, so far as 
this may be done with a linear equation. For examples of such generalised 
regression equations the student may turn to the illustrations worked out 
later (pp. 270-275). 

14.3. With this explanatory introduction, we may now proceed to 
the algebraic theory of such generalised regression equations and of 
multiple correlation in general. It will first, however, be as well to revert 
briefly to the ease of two variables. In Chapter 11, to obtain the greatest 
possible simplicity of treatment, the value of the coefficient r=p/(T 1 o\ i 
was deduced on the special assumption t hat the means of all arrays were 
strictly collinear, and the meaning of the eoeflieient in the more general 
ease was subsequently investigated. Such a process is not conveniently 
applicable when a number of variables arc to be taken into account, and 
the problem has to he faced directly : i.e. required , to determine the 
coefficients and constant term , if any , in a regression equation , so as to make 
the sum of the squares of the errors of estimate a minimum. 

14.4. To solve this problem we proceed as m 11.20. 

Let us measure the variates X x . . . X n from their respective means, 
denoting the quantities so obtained by oc x . . . x n . 

Then the regression equation of, say, or { on x 2 . . . x n may be written 
in the form 

30 i (I i + + bp 1 ^ + . . . +b n or n 

We have to find b 2 , . . . b n such that 

^1 «(.<', - 11 1 Va • • • - h n*n)- 



PARTIAL CORRELATION. t 263 

is a minimum, the summation taking place over all sets of values of 

A • ■ • 

Now, 

A’i = SK 2 ) + - V\; - . . . -b n x n ) s 

the produet term 

2S|ff 1 (,r l - b^r,- . . . -b n cr n )} 

vanishing, since <r„ etc. arc measured from the mean. 

Hence we have, foi the minimum value of E ki 

= 0 

Now, if b 2 is chosen so that E x is a minimum, the value of E v when 
(h z f 8) is substituted for b,, is increased no matter how small 8 may be; 
ij. 

- (b> I h)7, . . . b n r n } 2 - S(^ - b<jv 2 - . . . -Mn) 2 

Expanding the left-hand side, and neglecting S 2 , which can be made as 
small as we please compart'd with 8, 

-!>*** ~ • • • ~ b„x n ) 2 - 2S{tr 2 (ir 1 - - . . . 

; S(* x ~b 2 x z - . . . -b n cc n y 
or 

S;r,(,rj ~& 2 cr 2 . . . - b n <r n )} 8 - 0 

Now tins is to be true for all small \allies of 8, positive or negative. 
If S{a A r t - bj\ ~ . . . ~b„x, ( )\ were not zero, this would be impossible, 
tor if it were positive, say, we could take 8 positive and the inequality 
would not be satisfied. 

1 It nee, 

* • • “M’JKO 

Similarly, consult ling b 3 instead of /> 2 , we have 

~b 2 i>- . . . -K*n) j^O 

and so on, there being (n -1) equations. These are sullieient to determine 
tire (rt -1) quantities b z . . . b fr and hence our problem is solved. 

Notation. 

14.5. At this point we introduce a flexible notation which will enable 
us to consider any regiession equation. 

We write: 

,T l-^J2 34 7^2+^13 24 77^3+ * * • 23 (n~l) X n (14.1) 

The quantities b are partial regression coefficients. The first subscript 
attached to the b is the subscript of the letter on the left (the dependent 
variable). The second subscript is that of the x to which it is attached. 
These are called primary subscripts. 

After the primary subscripts, and separated from them by a point, 
are placed the subscripts of the remaining variables on the right. These 
are called secondary subscripts. 



264 THEOKY OF STATISTICS. 

Equation (14.1) is the regression equation of oc v Similarly, in accord- 
ance with the rules we have just laid down, we have: 

#2 5=1 £>21.34 . . . n^'l “t £>23.14 . . . n^3 “h • • • + £>2n.l3 . . (n~l)^n 


and so on. 

It should be noted that the order in which the secondary subscripts are 
written is immaterial; but this is not true of the primary subscripts ; e,g. 
£>i 2 . 3 . . n and denote quite distinct coefficients, oc x being tine 

dependent variable in the first case and oc 2 in the second. 

A coefficient with p secondary subscripts may be termed a regression 
of the pth order. The regressions b l29 b 2lf 6 J3 , /y 31 , etc., obtained by 
considering two variables alone, may be regarded as of order zero, and may 
be termed total, as distinct from partial, regressions. 

14.6. If the regressions b n 34 . w , b 13 24 . . v> etc., be assigned the 

“ best ” values, as determined by the method of least squares, the difference 
between the actual value of ,r 1 and the value assigned by the right-hand 
side of the regression equation (14.1), that is, the error of estimate, wdl be 
denoted by oc x 28 . n ; i.e. as a definition we have 

*h.23 . . . n~ lT 1 -"£>12 34 . . . 2 “£>13.24 . . n a 's ” * • • “ £>1 n 23 . . (n 1 ) a 'n (14.2) 

where or v 4 r 2 » . . . cr n are assigned any one set of observed values. Such an 
error (or residual, as it is sometimes called), denoted by a symbol with p 
secondary suffixes, will be termed a deviation of the pth order. 

Finally, we will define a generalised standard deviation o-j 2:j _ n by 

the equation 

Na l 23 2.t . . . J • • • (14.3) 

N being, as usual, the number of obsenations. A standard deviation 
denoted by a symbol with p secondary suffixes will be termed a standard 
deviation of the pth order, the standard deviations oq, cr 2 , etc., being 
regarded as of order zero, the standard deviations cq 2 , or 2 v etc., of the first 
order, and so on. 

14.7, In the ease of tw r o variables, the correlation coefficient r vl may 
be regarded as defined by the equation 

r 12 = (£>12£>2l) J 

We shall generalise this equation in the form 

r l2.34 . . . n ~ (K 34 . . n£>21 34 . . . 7i )* . . (14.4) 

This is at present a pure definition of a new symbol, and it remains to be 
shown that r 12<34 .. v may really be regarded as, and possesses all the pro¬ 
perties of, a correlation coefficient; the name may, however, be applied 
to it, pending the proof. A correlation coefficient with p secondary 
subscripts will be termed a correlation of order p . Evidently, in the 
ease of a correlation coefficient, the order in which both primary and 
secondary subscripts is written is indifferent, for the right-hand side of 
equation (14.4) is unaltered by writing 2 for 1 and 1 for 2. The correla¬ 
tions 7 *i 2 , r 18 , etc., may be regarded as of order zero, and spoken of as total, 
as distinct from partial, correlations. 






PARTIAL CORRELATION. 


265 


The Normal Equations. 

14.8. All the quantities we have just defined are expressible in terms 
of the total and partial regression coefficients, and particular importance 
therefore attaches to the equations which give those coefficients. The 
equations of 14.4 may be written 

S(,T 2 ^ 1.23 . . n) * • * (14.5) 

etc., there being (n - 1 ) equations for each regression equation. 

These equations are called the normal equations. We shall sec 
below that in practical eases it is usually more convenient not to solve them 
directly but to proceed in stages, finding first the regressions and correla¬ 
tions of order zero, then those of order 1 , and so on. 

14.9. If the student will follow the process by which (14.5) was 

obtained, he will see that when the condition is expressed that b 12U . n 
shall possess the “ least-square ” value, a\ enters into the produet-sum with 
<Tj n ; when the same condition is expressed for b l3 24 # < sc 3 enters 

into the product-sum, and so on. Taking each regression in turn, in fact, 
every the sullix of w hieh is included in the secondary suffixes of cr 123 _ n 

enters into the product-sum. The normal equations of the form (11.5) are 
therefore equivalent to the theorem: 

The product-sum o f any deviation o f order zero with any deviation of higher 
order is zero , provided the subscript oj the former occur among the secondary 
subscripts of the latter. 

14.10. hut it follows from this that 


>S(.r, 31 2 34 

Similarly, 


u ) - 34 . . . n (tV 2 - b 23 4 . . . n X 3 -. . . - b 2n 34 

~ &{ x i 34 . . *1^2) 

S(^i 3| . n X 2 34 . . . n) ~ S( X J X 2 31 . . n) 


Similarly again, 

34 . n X 2 .34 . . (?z 1 ))“ S(iTj 34 . . n X z) 


and so on. Therefore, quite generally, 

34 . . . n* r 2 34 ... n) = S(/T , 3i . . ( n i )‘T, 34 ^ . n ) 

= s (or x x 2M . . . n ) 

-S( t rj n (w 1 )) 


. (n ~j 


)X n )} 


. (14.6) 


’^(^3.34 . . . n X 2) ) 

Comparing all the equal product-sums that may be obtained in this w r ay, 
we see that the produet-sum o f any two deviations is unaltered by omitting any 
or all of the secondary subscripts of either which are common to the two , and, 
conversely , the product-sum of any deviation of order p with a deviation of 
order p + q, the p subscripts being the same in each case, is unaltered by adding 
to the secondary subscripts of the former any or all of the q additional sub¬ 
scripts of the latter . 



THEORY OF STATISTICS. 


266 


It follows therefore from (14.5) that any product-sum is zero if all the 
subscripts of the one deviation occur among the secondary subscripts of the 
other . As the simplest case, wc may note that is uncorrelated with 
and a? 2 uncorrelatcd with tr 1>2 . 

The theorems of this and of the preceding paragraph are of fundamental 
importance, and should be carefully remembered. 

14.11. We can now show that the quantities r defined by (14.4) are 
really coefficients of correlation. In fact we have, lrom the results ot 14.9 
and 14.10, 


That is, 


0 — S (#0 34 

rfll 234 ... n) 


~ S {® 2 34 

n( a i ~ ^i 2 3 i n (T, -terms mx ) 

to cT,,)) 

— S(^r 1 it , 2 > t1 

. «) b J2 34 « 8 (tl'oti ’2 34 n ) 


~ 8(^1 34 

n ,r Ji 31 n) b 12 34 . 31 

n) 


S( <r i 34_. 31 




(11.7) 


But this is the value that would have been obtained by taking a regression 
equation of the form 

34 . n "12 31 n lT 2 31 n 


and determining b 12M n by the method of least squares, i.e. b vl l4 v 
is the regression of uq , J4 n on cc 2 H n . it follows at once liom 
(11.4) that r 12 34 „ is the correlation between 4 r, u n and ,r> 

and from (14.8) that we may write 

^12 34 «“ r i2 34 n ~ • • (14.8) 

a 2 34 n 

an equation identical with the familiar relation b lz -i l2 cr x jcr 2y with the 
secondary suffixes 31 . . . n added throughout. 

To illustrate the meaning of the equation by the simplest ease, if wc had 
three variables only, ;r x ,‘*r 2 ar *d the value of b 12 i or r 12 3 could be 
determined (1) by finding the correlations ? J3 and r> 3 and the corresponding 
regressions and & 23 ; (2Jjpbrking out the residuals <r x - b x and x 2 - b 2 ^r A 
for all associated deviatiofiy (3) working out the correlation between the 
residuals associated with the same values of t r 3 . The method would not, 
however, be a practical one, as the arithmetic would be extremely lengthy, 
much more lengthy than the method given below for expressing a correla¬ 
tion of order p in terms of correlations of order p - 1. 


Expression of Standard Deviation in terms of Standard Deviations 
and Coefficients of Lower Orders. 

14.12. Any standard deviation of order p may be expressed in tn ms of a 
standard deviation of order p-1 and a correlation of aider p -1. For, 

8 (#i 23 . w ) 2 8 (#4 23 (n 1)^1 23 n) 

“ S((T x 23 (n-l))(^x " b ln 2J (n-l&v “ terms ill to X n J 

^8(^23 ...(»-!)) “^Jn 23 <«-l)8(#i 23 (n 1 An 23 (n-1)) 




PARTIAL CORPKTjATION. 


267 


or, dividing through by the number of observations: 

°1.33 . . . n~ a 1.23 . . . (n-l)(l ~ ^Im.23 . . (n-lfinl 23 . (n-1)) 

~ a i.m . . . (»-i)(l ~ r L.23 . . . («-i)) • • • (14.9) 

This is again the relation of the familiar form 

*?» = o?(l-r? n ) 

with the secondary suffixes 23 . . . (n -1 ) added throughout. It is clear 
from (14.9) that r ltl 23 . like any correlation of order zero, cannot be 

numerically greater than unity. It also follows at once that if we have 
been estimating or 1 from .r 2 , ( t j? . . . .r„ ,, j n will not increase the accuracy 
of estimate unless r ln lM (w ^ (not differ from zero. This condition 
is somewhat interesting, as it leads to rather unexpected results. For 
example, if r 12 - H 0-8, r 1{ -~~ -) 0-4, 1 23 - ^ 0*5, it will not be possible to 

estimate ,r 1 with any greater accuracy from t r 2 and t r 3 than from .r 2 alone, 
for the value of r ltJ 2 is zero (see below, 14.15). 

14.13. It should he noted that, m equation (11.9), any other sub¬ 
sen pt can be eliminated in the same way as subscript n from the sufTix of 
(j x n so that a standard deviation of order p can l)e expressed in p 

ways in terms of standard deviations of t he next lower order. This is useful 
as affording an independent check on arithmetic. Further, a 1 t 
can be expressed in the same way in terms of a, . 2fJ , (h _ 2) , aiK ^ so on » so 

that we must have 

„* °'i0 -^)(! -»?,*,) • • . 0 - r Ln . . 0- 1,) (“•!«) 

This is an extremely eomenient expression for arithmetical use; the 
arithmetic can again be subjected to an absolute* cheek by eliminating the 
subscripts in a different, say the* inverse, older. Apart from the algebraic 
pi oof, it is obvious that the* values must be identical; for if we arc estimat¬ 
ing one \ariablc from w others, it is clearly indifferent in what order the 
latter are taken into account. 

a i 23 . v can also be expressed m terms of and the total correlation 

coefficients. We have 


S(‘ r J 23 . n) 2 S« tT l(‘ r i 23 


n 

Hence, expanding <r, 2; , . 

*4 


" ^12 3 . n r il a l°2 ^13 2 

~ . . . 

a I 23 . . , n 

The (n - 1) normal equations involving 

•*1 23 . . n 


S(vC 2 cT 1 23 . n 

) - 0, etc. 


i.e. expanding, 



r 2l (J l a 2 ^ ; 12 3 . . . n a 2 ~ ^13 2 

n r 2Z a 2 <7 .i * * 

. —0 

r 31 cr l cr 3 ~“^12 3 . . . n r 3‘Z a, A (J 'Z ~ 

^13 2 . «°3 * • 

. ~ 0, etc. 


Regarding the n equations so obtained as equations in the quantities 5, 
we have, on elimination, the determinant 





268 


THEORY OF STATISTICS 


2 2 

°1 *" <*1.23 . . . n 

r 12° r l° r 2 


• ^lnCTl^n 


r ll a l a l 

*2 

r 2l\ a 2 <7 ‘3 * * 

• r 2n a 2 a n 

-0 

rniWi 

r n2 a n <J 2 

r n3 (J n C'fl • • 

• a* 



Dividing the sth row by cr^ and the fth column by cj t , this gives: 


2 

... 71 

<y'i 

r»i 


r !2 

1 


r is • . • r 
/•>, . . . r, 


1 n 


I r„ 

Write co for the determinant 


I r 12 . . . r tn 

r 2l I . . . r 2n 

r m r n2 . . . 1 


and let to u be the minor of the term in the first row and column. Then 

2 

23 . . . n .. 


Similarly, 


and so on. 

These results exhibit erf.#. . 


a l 23 . . . i 


°2 13 . . . « O) 

ol (,, >2 

, etc., in a symmetrical form. 


(14.11) 


Expression of Regression Coefficients in terms of Coefficients of 
Lower Orders. 

14.14. Any regression of order p may be expressed in terms of 
regressions of order p ~ 1. For we have : 


\1.34 


n x 2.31 


. n) “ S(iTi 34 . ( n - l)‘ r 2 34 . . n ) 

S( t r J<34 .. 34.. (n-i^n - termsill^to t r n „j) 

“^(^1 34 . in 1)^2 34 in -1)) ~~^2n 34 in 34 . in 1 ) a 'n 34 


. in— 


Replacing b 2n 34 . . in- 1 ) ^it2.31 . . (?? 1 ) f T *> at . . (n l/ 0 ^'U . . (n~l) 

wc have: 


2 34 


n u 2.34 


. . n“ ^12 34 . . («-l) a 2.84 . . (n-1) “ ^1 n 34 - . (n-l)^«2 34 . (n-J) G 2M . . (n 




PARTIAL CORRELATION. 


269 


or, from (14.0), 

b 


12 34 


_^12 34 (w- 1) ~k\v 34_ (n -1 )^n2 31 (n -1) 

^ ~ ^2n 34 (w -l)^n2 34 (n l) 

The student should note that this is an expression of the form 


(14.12) 


^12 n 


^12 " ^l rAi 2 
1 ~ ^2>/hj2 


with t Ik* subscripts .34 . . . (w- 1 ) added throughout. TJk coellicienl 
bj 2 34 n may there!ore be regarded as determined from a regression 
equation of the form 


‘ r j 14 <n 1) ^12 34 n*^ 2 34 <rc-l> 23 (n-3 )**« 34 (n 3) 

/>. it is the partial regression of x 1 34 (n n on a 2 34 (u 1} , ,r n 34 
being given. As any other secondary suffix might have been eliminated 
in lieu of w, we might also regaid it as the partial regression of x l 46 n 
on a 2 45 «r, 4r> „ being given, and so on. 


Expression of Correlation Coefficient in terms of Coefficients of 
Lower Orders. 

14.15. From equation (1 1 . 12 ) we may readil> obtain a corresponding 
equation for correlations. Foi ( 11 . 12 ) may be written: 


i ? 12 34 (?/ I)“ ? ln34 14 (w 1) U 1 31 

^12 14 H •, 2 

1 ” 7 !?w 31 (n 1) 11 

Hence, writing down the eonespondmg expiessum tor b £i 34 
taking the square root : 


**13 


12 34 


r 12 11 _ (n—1) ~ r \n 14 

0 ~ ; if? 14 (r? - ]))^0 


(v - 1)/ V 1 '2w 51 

This is, similarly, the expression for three vanables : 


in 1) r 2it 51 in 1) 


in 1) 
in 1) 

Tl and 

(11.13) 


r „ r 12 ~r ln r^n 

12B ( 1 -/DH 1 -<)* 

with the secondary subscripts added thioughout, and r 12 M can be 

assigned intci pie tat ions eonespondmg to those ot b l26 4 n above. 
Evidently equation (14.13) permits of an absolute cheek on the arithmetic 
m the calculation of all paitial coefficients of an order higher than the 
first, lor any one of the secondary suffixes of ? 12 34 n can be eliminated 
so as to obtain another equation of the same form as (14.13), and the 
value obtained for r 12 34 v by inserting the values of the coefficients 
of lower order in the expression on the right must be the same in each 
ease. 


Practical Procedure. 

14.16. The equations now obtained provide all that is necessary for 
the arithmetical solution of problems m multiple correlation. The best 
mode of procedure on the whole, having calculated all the correlations 
and standard deviations of order zero, is ( 1 ) to calculate the correlations 



THEORY OF STATISTICS. 


270 

of higher order by successive applications of equation (14,18) ; (2) to 

calculate any required standard deviations by equation (14.10); (3) to 
calculate any required regressions by equation (11.8) ; the use of equation 
(14.12) for calculating the regressions of successive orders directly from 
one another is comparatively clumsy. We will give two illustrations, 
the first for three and the second for four variables. The introduction of 
more variables does not involve any difference in the form of the arithmetic, 
but rapidly increases the amount. 1 

Example 14.1 .— In Exercise 11 2, page 221. we gave some data of (1) 
the average earnings of agricultural labourers, (2) the percentage of the 
population in receipt of poor law relief, (3) the ratios of the numbers in 
receipt of outdoor relief to those relieved m the workhouse, for 38 rural 
districts. Required to work out the partial correlations, regressions, etc., 
for these three variables. 

Using as our notation X x average earnings, A\> -percentage of 
population in receipt of rebel, A',,—out-relief ratio, the lirst constants 
determined are: 

15*8 shillings cr i - 1*71 shillings -0*00 

My~ 3-07 per cent. o\> - 1*21) per cent. ? r , -013 
Ml - 5*79 oj 3 09 r> y i 0*00 

To obtain the partial correlations, equation (1 hi3) is used direct in 
its simplest form: 

*12-7*137*23 

123 ~(J-4) 4 u W 

The work is best done systematically and the results collected in 
tabular form, especially if logarithms are used, as many of tin* logarithms 
occur repeatedly. First, it will be noted that the logarithms of (1 - / 2 )f 
occur in all the denominators ; these had, accordingly, better be worked 
out at once and tabulated (col. 2 of the table below). In column 3 the 


1. 

2. 

rf 

l 

r. 

6 

7. M 

9. 







Oorrtl itioji of 



/ __ 

J0o 1)1 l 

3Smn< i ) 

log 

1 >£ 

.Kirsi Ordrr 



1<>K vi - r 4 

Imu 

toi 

1 

.N t mi 

1 

l)t uom 

log | \ Jlll( 

log Vl r 3 

r x 2 — 0 66 

T 87580 

-0 0 7SO 

i 

- 0 “1820 

T 76402 

T soois 

1 

T %>5J r ti , 0 7 * 

I 83216 

r n -0 33 

I 00020 

- 0 30 fio 

4 0 264*0 

T 42188 

I 77880 

T 0r>00 r l3 2 + 0 44 

I 93267 

0 60 

I 00 100 

+ 0 0858 

H> 5112 

I 71111 

I 8720) 

T 8)001 r n ,+ 0 60 

T 80046 


product term of the numerator of each partial coefficient is entered, i.e, 
the product of the two other coeHicienls on the remaining lines in column l; 
subtracting lhis from the eoellicient on the same line in column 1, we have 
the numerator (col. 4) and can enter its logarithm. The logarithm of the 

1 It will be noticed from the preceding work tiiat all correlations are assumed to be 
determined by the product-sum formula. The method has been applied with correlations 
determined in other ways, e.g. from fourfold or contingency tables or by the method of 
ranks. In spite of the favourable result of an experimental test (Ethel M. Newbold, 
“Notes on an Experimental Test of Errors in Partial Correlation derived from Four-fold 
and Biserial Total Coefficients,” Biometnka , vol. 17, 1025, p. 251), the results obtained 
in such ways remain of doubtful value. 





PARTIAL CORRELATION. 


271 


denominator (col. b) is obtained at once by adding the two logarithms of 
(] -r 2 ) 4 on the remaining lines of the table, and subtracting the logarithms 
of the denominators from those of the numerators, we have the logarithms 
of the correlations of the first order. It is also as well to calculate at 
once, for reference in the calculation of standard deviations of the second 
order, the values of log V 1 - r 2 for the first-order coefficients (col. 9). 

Having obtained the correlations, we can now proceed to the regressions. 
If we wish to find all the regression equations, we shall have six regressions 
to calculate from equations of the form 

^12 3 >*12 3 cr l 3 

These will involve all the six standard deviations of the first order a } 2 » 
or 1 „ ( 7 2>1 , or, 3 , etc. The standard deviations of the first order arc not 
in themselves of much interest, but the standard deviations of the second 
order me important, as being the standard errors or root-mean-square errors 
of estimate made m using the regression equations of the second order. 
We may save needless arithmetic, therefore, by replacing the standard 
deviations of the first order by those of the second, omitting tlie former 
entirely, and transforming the a bow equation for b l2 3 to the form 

^12 3 ~ r iz 3 <T 1 2J/ /(T 2 13 

Tins 1 rattsformafion is a useful one and should be noted by the student. 
The \ alues of each a may be calculated twice independently by the formulae 
of the form 

°\ 2 't _<T iO 1 r i3 2 )^ 

“ai(i -+)+ - 4.4 

so as to cheek the arithmetic* ; the work is rapidly clone if the values of 
log V i 1 2 have been tabulated. The values found are: 

log a, .,3 0*06140 04 >3 115 

log a 2 13 - T*8458 J< a 1 13 - 0*70 

log a 3 12 0431571 cr 3 12 — 2*22 


From these and the logarithms of the r \s we have: 


loo , _ 0-08116, 

- 1-21 

log 4 2 

= l-3f>m, 

bn 2 - 

4 0*23 

log hi a 1-64903, 

-(MS 

log b n , 

—1-33017, 

^23 1 

4 0*22 

log /> 31 2 

b 3! 2 ( 0-83 

log £321 

= 0-3380], 

^32.1 

+ 2*18 


That is, the regression equations are: 


( 1 ) ar x - - 1 * 21 ^ t 0 * 28^3 

( 2 ) cr 2 - 0 * \t>x x 4 0 * 22^3 
(8) cr 3 - +0*85^+2*18^ 

or, transferring the origins to zero : 

(1) Earnings X x - +19*0 -1*21 A 2 + 0-23X* 

(2) Pauperism X 2 --== + 9-55 - 0-45X! + 0*22A 3 

(3) Out-relief ratio X s =- -15*7 + 0*85^ +2-18A” 2 

The units are throughout one shilling for the earnings X v 1 per cent, for the 

pauperism X 2 and 1 for the out-relicl* ratio X z . 




272 


THEORY OF STATISTICS* 


Now let us examine the light thrown by these results on the relationship 
between the variables. 

The first and second regression equations are those of most practical 
importance. The argument has been advanced that the giving of out- 
relief tends to lower earnings, and the total coefficient (r 13 = - 0*13) between 
earnings (X x ) and out-relief (X 3 ), though very small, does not seem in¬ 
consistent with such u hypothesis. The partial correlation coefficient 
(r 132 “ +0*44) and the regression equation (1), however, indicate that in 
unions with a given percentage of the population in receipt of relief (JSC 2 ) 
the earnings are highest where the proportion of oul-relief is highest; and 
this is, in so far, against the hypothesis of a tendency to lower wages. It 
remains possible, of course, tliat out-relief may adversely affect the possi¬ 
bility of earning, e.g. by limiting the employment of the old. 

As regards pauperism, the argument might be advanced that tlie 
observed correlation (r 23 =■ +0*60) between pauperism and out-relief was 
in part due to the negative correlation (r Xi ~ 0*13) between earnings and 

out-relief. Such a hypothesis would have little to support it in view of the 
smallness and doubtful significance of >* n , and is definitely contradicted 
by the positive partial correlation ? 23 { i 0*60 and the second regression 
equation. The third regression equation shows that the proportion of 
oul-relief is on the whole highest where earnings are highest and pauperism 
greatest. It should be noticed, however, that a negative ratio is clearly 
impossible, and consequently the relation cannot be strictly linear; but 
the third equation gives possible (positive) average ratios for all the 
combinations of pauperism and earnings that actually occur. 

Example 14,2 (Four Variables ),—As an illustration of the form of the 
work in the ease of four \ ariablcs, we will take a portion of the data from 
another investigation into the causation of pauperism. 

The variables are the ratios of the values in 181)1 to the values in 1881 
(taken as 100) of— 

1 . The percentage of the population in receipt of relief, 

2 . The ratio of tin* numbers given outdoor relief to the numbers relieved 

in the workhouse, 

3. The percentage of the population over G5 years of age, 

4. The population itself, 

in tin 1 metropolitan group of 32 unions, and the fundamental constants 
(means, standard deviations and correlations) are as follows : — 


Table 14.1. 


1. 

Means. 

2. 

Standard 

deviations. 

3. 

Correlation 

coefficient. 

4. 

log \/l - r a . 

1 


1 

29*2 

19 

+ 0*52 

1-83154 

2 

90-6 

2 

417 

HI 

■fill 

1-96003 

3 

1077 

3 

5*5 

14 

1 

1*99670 

4 

111*3 

4 

23-8 

23 

ESfl 

1-94038 

— 

— 

— 

— 

24 

+ 0 23 

1-98820 

' 




34 

+ 0*25 

1-98598 














PARTIAL CORRELATION. 


278 


It is seen that the average changes are not great; the percentages of the 
population in receipt of relief have increased on an average by 4*7 per cent., 
the out-relief ratio has dropped by 9*4 per cent, and the percentage of 
the old has increased by 7*7 per cent., while the population of the unions 
has risen on the average by 11*3 per cent. At the same time the 
standard deviations of the first, second and fourth variables are very large. 
As a matter of fact, while in one union the pauperism decreased by nearly 
50 per cent, and in others by 20 per cent., in some there were increases of 

Table 14.2. 


1. 

Correlation 
coefficient 
(Zero Order). 

2. 

Pioduct 
Term of 
Numerator. 

8. 

Numerator. 

4. 

Con elation 
coefficient 
(First Older). 

6. 

log VI ~ r\ 

12 

+ 0 52 

+ 0 2009 

+ 0*3191 

12*3 

+ 0*4013 

1*96187 

13 

+ 0 41 

+ 0*2548 

+ 0*1552 

13*2 

+ 0*2084 

1*99035 

23 

f 0*49 

+ 0 2132 

+ 0*2768 

23*1 

+ 0*3553 

1*97070 

12 

+ 0*52 

-0-0322 

+ 0 5622 

12*4 

+ 0*5731 

I 91355 

14 

-0 14 

+ 0*1196 

-0*2596 

14*2 

-0*3123 

1*97772 

24 

+ 0-23 

-0*0728 

+ 0*3028 

24*1 

4 0*3580 

1*97022 

13 

+ 0*41 

- 0 0350 

+ 0*4150 

13*4 

+ 0 4642 

1*94731 

14 

-0-14 

+ 0*1025 

-0*2125 

14 3 

0*2716 

1*98297 

34 

4 0‘25 

-0*0574 

+ 0 3074 

34 1 

1-0 3404 

1*97326 

23 

+ 0-49 

+ 0*0575 

+ 0*4325 

23 4 

4 0 4590 

1*94863 

24 

+ 0 23 

+ 0 1225 

+ 0*1075 

24*3 

+ 0*1274 

1*99645 

34 

+ 0*25 

+ 0*1127 

+ 0*1373 

34*2 

4 0 1618 

1*99424 


00 , 80 and 90 per cent. ; similarly, in the case of the out-relief, in several 
unions the ratio was decreased by 10 to 00 per edit., a consistent anti-out- 
rclief policy having been enforced; in others the ratio was doubled, and 
more than doubled. As regards population, the more central districts 
showed decreases ranging up to 20 and 25 per cent., the circumferential 
districts increases of 45 to 80 per cent. The correlations of order zero are 
not large, the changes In the rale of pauperism exhibiting the highest 
correlation with changes in the out-relief ratio, slightly less with changes 
m the proportion of old and very little with changes in population. 

The correlations of the second order arc obtained in two steps. In the 
first place, the six coefficients of order zero are grouped in four sets of three, 
corresponding to the four sets of three variables formed by omitting each 
one of the four variables in turn (Table 14.2, col. 1). Each of these sets 
of three coefficients is then treated in the same manner as in the last 
example, and so the correlations of the first order (Table 14.2, col. 4) are 
obtained. The first-order coefficients are then regrouped in sets of three, 
with the same secondary suffix (Table 14.3, col. 1), and these are treated 
precisely in the same way as the coefficients of order zero. In this way, it 
will be seen, the value of each coefficient of the second order is arrived at in 
two ways independently, and so the arithmetic is cheeked : r 12 34 occurs in 

18 




THEORY OF STATISTICS. 


274 

the first and fourth lines, for instance, r ]3>24 in the second and seventh, and 
so on. Of course slight differences may occur in the last digit if a sufficient 
number of digits is not retained, and for this reason the intermediate work 
should be carried to a greater degree of accuracy than is necessary in the 
final result; thus four places of decimals were retained throughout in the 
intermediate work of this example, and three in the final result. Tf he 
carries out an independent calculation, the student may differ slightly 
from the logarithms given in this and the following work, if more or fewer 
figures are retained. 

Tabu: 11.3. 



1. 

2 

8. 

4. 

6. 

Coi 

i elation 

Product 

Cot i 

1 ition 


coefficient 
(First Order). 

Term of 
Name i a tor 

Numerator. 

cooliicicnt 
(Second Older). 

log sj 1 - r 2 . 

12 4 

+ 0*5731 

+ 0*2131 

+ 0*3600 

12 34 

f 0 457 

1*94901 

13 4 

+ 0 11*42 

+ 0*2631 

4 0*2011 

13 24 

4 0 276 

1 98277 

23 4 

+ 0*4590 

+ 0*2660 

+ 0 1930 

23 14 

+ 0*266 

I 98408 

12 3 

+ 0*4013 

- 0 0350 

4 0 4363 

12*54 

+ 0 457 

_ 

14*3 

- 0*2746 

+ 0*0511 

- 0 3257 

14 21 

- 0 359 

T 97013 

21*3 

+ 0*1274 

-0*1102 

+ 0 2370 

24 13 

+ 0 270 

1 98359 

13 2 

4 0*2084 

-0*0505 

+ 0*2589 

13*24 

+ 0 276 

_ 

14‘2 

-0*3123 

+ 0 0337 

-0*3460 

14*23 

-0*359 

_ 

34*2 

+ 0 1618 

-0*0651 

+ 0 2269 

34 12 

+ 0 24 1 

1*98664 

23*1 

+ 0 3553 

+ 0*1219 

+ 0 2334 

23*14 

4 0 206 

__ 

24 1 

+ 0 3580 

+ 0 1209 

+ 0*2371 

21 13 

4 0 270 

— 

84*1 

+ 0*3404 

i 

+ 0 1272 

+ 0 2132 

34 12 

+ 0*241 

— 


Having obtained the correlations, the regressions can be calculated from 
the third-order standard deviations by equations of the form (as in the last 
example), 

/, _ r a l 231 
U U 31 ~ 7 12 34 

°1 111 


so the standard deviations of lower oideis need not be evaluated. Using 
equations of the form 


-< 7 ,( 1 ->10*0 - 

? i3iH* ' 


lo|J 17, 234 “ T , *10 

21! 

22*8 

log ,34 —1-50507 

°2 114 

-32*1 

log a s 121 — 0-65773 

°3 12 1 

t*55 

log <t 1123 = 1-32914 

°4 121 

— 21 *8 


All the twelve regressions of the second order can be readily calculated, 
given these standard deviations and the correlations, but we may coniine 



PARTIAL CORRELATION. > 275 

ourselves to the equation giving the changes in pauperism (X 2 ) in terms of 
other variables as the most important. It will be found to be 

od x - 0*3254% + 1 *383 ,r 3 - 0*383jq 

or, transferring the origins and cx])ressing the equation in terms of per 
cent age ratios, 

X x = - 81*1 + 0*325A 2 -l 1 *383A , - 0-383X* 

or, again, in terms oi percentage changes (ratio -100) : 

Percentage change in pauperism 
— + 1*4 per cent. 

-I 0*325 times the change in out-rehef ratio 
f 1*383 ,, „ ,, pioportion ot old 

-0*383 ,, „ ]>opulation 

Tluse rtsuits render the interpretation of the total coefficients, which 
might be equally consistent with sc\ <. ral li> pothescs, more clear and definite. 
The questions would arise, for instance, whether the correlation of changes 
m pauperism with changes in out-rehel migiit not be due to correlation of 
the latter with the other factors introduced, and whether the negative 
correlation with changes m population might not he due solely to the 
correlat ion of t ho latter with changes in the proport ion of old. As a matter 
oi fact, the partial correlations of change s m pauperism with changes m 
out-rehef and in proportion of old are shghtly less than tlie total correla¬ 
tions, but thi‘ partial correlation with changes m population is numerically 
greater, the figures being: 


2 

+ 0*52 

>12 ji-- 

* 0*16 


i 0* 11 

1 1.1 24 “ 

} 0*28 

^14 

0*14 

1 1 1 21 

- 0*30 


So far, then, as we have taken the factors ot the ease into account, there 
appeals to be a true correlation between changes m pauperism and changes 
m out-rehef, proportion of old and population the latter serving, of 
course, as some index to changes m general prosperity. The relative 
influences of the three factors are indicated bv the regression equation 
above. (For the full discussion of the ease, ef. Jour . Not/. St at. ^ Voc., 
\ol. G2, 1800.) 

Aids to Calculation. 

14.17. To facilitate the computation of partial correlation and 
regression coefficients, various tables of such quantities as 

, 1 

1 _ vi -r 2 — — — 

’ ’ V(1 -rf,)<l -rf.) 

have been prepared. See, lor instance, refs. ( 010 ) and ( 011 ). 

The Generalised Scatter Diagram. 

14.18. The scatter diagram in two dimensions may be generalised to 
three dimensions, and may also be used as a mental construct for higher 
dimensions, though no actual model can of course be made. 



THEORY OE STATISTICS, 


276 


Consider the case of three variates. The values of X v X 2 and X 8 
associated with any given individual may be regarded as determining a 
point in space whose co-ordinates arc X v X 2 and X 3 . The totality of 
individuals will therefore give us a swarm of points in three-dimensional 
space, which will lie distributed in certain ways about planes of regression. 

Fig. 14.1 is drawn from a model representing the data of Example 14.1. 




Ti 


Fig. 14.1. -Model Illustrating the Correlation between Three Variables: (1) Average 
Weekly Earnings of Agricultural Labourers (data, Example 14.1 and Exercise 11.2); 
(2) Pauperism (percentage of the population in receipt of Poor Law Relief); 
(8) Out-relief Ratio (numbers given relief in their homes to one in the workhouse). 
Ay front view; By view of model tilted till the plane of regression for pauperism 
on the two remaining variables is seen as a straight line. 


Four pieces of wood arc fixed together like the bottom and three sides 
of a box. Supposing the open side to face the observer, a scale of pauper¬ 
ism is drawn vertically upwards along the left-hand angle at the back of the 
4 ‘ box,” the scale starting from zero, as very small values of pauperism 
occur ; a scale of out-relief ratio is taken along the angle between the back 







PARTIAL CORRELATION. 


277 


and bottom of the box, starting from zero at the left; finally, the scale of 
earnings is drawn out towards the observer along the angle between the 
left-hand side and the bottom, but as earnings lower than 12s. do not occur, 
the scale may start from 12s. at the corner. Suitable scales are : pauper¬ 
ism, 1 in. =1 per cent. ; out-relief ratio, 1 in. = 1 unit; earnings, 1 in. = Is. ; 
and the inside measures of the model may then be 17 in. x 10 in. x8 in. 
high, the dimensions of the model constructed. Given these three scales, 
any set of observed values determines a point within the “ box.” The 
earnings and out-relief ratio for some one union art* noted first, and the 
corresponding point marked on the baseboard ; a steel wire is then inserted 
vertically in the base at this point and cut off at t lie height correspond¬ 
ing, on the scale chosen, to the pauperism in the same union, being finally 
capped with a small ball or knob to mark the “ point ” clearly. The 
model shows very well the general tendency of the pauperism to be 
the higher the lower the wages and the higher the out-relief, for the 
highest points he towards the back and right-hand side of the model. 
If some representation of all three equations of regression were to be 
inserted in the model, the result would be rather confusing ; so the most 
important equation, viz. the second, giving the average rate of pauperism 
in terms of the ot her variables, may be chosen. Tins equation represents a 
plane ; the lines in which it cuts the right- and left-hand sides of the 44 box ” 
should be marked, holes drilled at equal intervals on these lines on the 
opposite sides of the box (the holes facing each other) and threads stretched 
through these holes, thus outlining the plane as shown in the figure. In 
the actual model the correlation diagrams corresponding to the three pairs 
of variables were drawn on the back, sides and base : they represent, of 
course, the elevations and plan of the points. 

The student possessing some skill in handicraft would find it worth 
while to make such a model for some case of interest to himself, and to 
study on it thoroughly the nature of the piano of regression, and the 
relations of the partial and total correlations. 

Coefficient of Multiple Correlation. 

14.19. Consider the regression equation for 

^1~^12 3 . 2 3 1 • • • +^ln 2 (n-l)^n 

Let us write the right-hand side of this equation as e J>23 ... so that in 
virtue of (14.2), 

^j.23 n == ^ 1 ~ tT 123 . . n • * (14.14) 

Now consider the correlation between t r l and e } ^ n . We have 
in virtue of the theorem of 14.10: 

^(¥i 23 n) -*Sfa? JL (a ? 1 - a*! 28 . . . «)} 

~ S(a? x 2 ) -Sf.Tj^r, 23 . n)} 

«S(V)-S (.r, w ... 

~ ^( a l ^1 23 . . . n) 

S(<h 23 . . 7i) 2 ~ ^(^1 23 n ) 2 

Ln...n) 


Also, 



278 


THEORY OE STATISTICS. 


llencf', the correlation between r x and e, 21 _ „ 

2 2 

°1 ~~ <7 J 21 . . n 

<*\ v' <4 ~ Cl'i 23 . . . n 

V23 
<*1 

We shall call this quantity R 1(23 w) . We have immediately: 

°\ 23 . . . n ~°l(l ” ^1(23 . . wj) • * • (14.15) 

7? 1(2 w) is called the multiple correlation coefficient between x x 
and x 2 . . . x n . We have, similarly, multiple correlations between «r 1 and 
fewer variables. Jt 1{2 n ) is cailed an (w -1 )-fold multiple correlation 
coefficient. R i( 3 „_j) would be an (n - 2)-fold coeflicient, and so on. 

14.20. The value ol‘ It may be calculated eit her directly from equation 
(14.15), or by substituting m that equation the value of af 2i t t „ obtained 
in (14.10), which gives : 

1 “ ~ (1 " r l2)(l ~ *13 2)(1 “ T U 2°.) • • • (1 ~ Hti 23 . . . (V -1)) (14.10) 

Properties of the Multiple Correlation Coefficient. 

14.21. It U2) n) , being the correlation between or l and e, 2 ? . 

measures how closely ,r t can be re])resented by the regression equation. If 
R~ 1, jr 1 can be perfectly represented bv such an equation, i.e. is a linear 
function of *r 2 . . . x tr In this ease aj , n N -<), i.e. all the residuals are 
zero. 

It may. in fact, be shown that lt }(2 > n) is greater than the correlation 
between cr x and any linear function of\? 2 . . . ,r n other than that expressed 
in the regression equation, i.e. e, 2i t . Putting this another way, the 
regression coellicients in e x 23 „ may be determined by the condition 

that the correlation between »r, and e, 2{ n is a maximum. 

R is Necessarily Positive or Zero. 

14.22. This is true, since the product term S( t r 1 e J 23 . n ) i s positive, 

being equal to N{(j‘j - ct* 2 s . an d we see from (14.10) that erf > o-J 23 n . 

Further, from 4.16), 

1 - It 2 'l -> 2 

i.e. R is not numerically less than r 12 . Similarly, it is not numerically less 
than any other total or partial correlation coefficient which can appear 
in (14.16). lienee, i? 1(2 . n > is not numerically less than any possible 
constituent coefficient of correlation . 

It follows from this that if lt Ji2 n) -=(), all the correlation coefficients 
involving x x arc zero, i.e. the variate x t is completely uncorrelated with the 
other variates . 

14.23. Further, even if all the variables X l9 X 2 , . . . X n were 
strictly uncorrelated in the original universe as a whole, we should expect 
y i 2 > r i 3 2 > r u 23 > etc.*° exhibit values (whether positive or negative) differing 



PARTIAL CORRELATION. 


279 


from zero in a limited sample* Hence, It will not tend, on an average 
of such samples, to be zero, but will fluctuate round some mean value. 
This mean value will be the greater the smaller the number of observations 
in the sample, and also the greater the number of variables. When only 
a small number of observations is available it is, accordingly, little use to 
deal with a large number of variables. As a limiting ease, it is evident 
that if we deal with n variables and possess only n observations, all the 
partial correlations of the highest possible order will be unity. We shall 
deal with the question of the significance of an observed value of R in a 
later chapter (23.45). 

Krample 11.3 . In Example 1 4.1 we found: 

r vz — ~ 0*06 

ri 3>a — +0*44 

lienee, from (11.10), 

l-tfi(M) H -(0-G6) 2 }{l (0*44)2) 

~ 0* J55 

whence 

* 1 ( 23 ) 

Similarly, it will be found that 

*2(13) ~ 0*84 

and 

*3(12) =0*70 

The student may verify by inspection that these values arc greater than 
the corresponding constituent values. 


Expression of Regressions and Correlations in terms of Co¬ 
efficients of Higher Orders. 

14.24. It is obvious that as equations (1 t.12) and (14.13) enable us to 
express regressions and correlations of higher orders in terms of those of 
lower orders, we must similarly be able to express the coefficients of lQwer 
in terms of those of higher orders. Such expressions an* sometimes useful 
for theore tical work. Using the same method of expansion as in previous 
eases, we have : 


That is, 


0-S(a , 1 23 . . . 34 . (n~l)) 

~ k(‘ r r r 2 34 . . . (n 1)) —^12.34 . n^( J V T 2 3 4 . (n-1)) 

”* ^ln.23 (n -l)b(** 2 34 An 1)) 

^12 34 . («-l) “&12 34 . n+^l»23 (n lA>2 31 . . (n 1) 


In this equation the coefficient on the left and the last on the right are of 
order n - 8, the other two of order ?i - 2. We therefore wish to eliminate the 
last coefficient on the right. Interchanging the suffixes 1 for n and n for 1, 
we have: 

bm.M . . . (n-1) ~^n2.13 


(n-1) +^nl.23 . . . (n—1)^12 34 . . . (n-1) 





280 THEORY OF STATISTICS. 


Substituting this value for b n2 34 m (v ,) in the first equation, we have: 

^12 34 • . . n+^ln-23 • • ♦ (n-l)^n2 13 . . * (n-1) 


2 34 • • . n . * (n-1)"n2 13- : * (n-1) ^ 

1 ~b\n 23 . . . (w-D^nl 23 . (n-1) 


^12.34 . . . (w~l) 

This is the required equation for the regressions ; it is the equation 


h — ~ 12 n n 2 j- 

1 I 2 

J "Iai 2 f Vl 2 


with secondary suffixes 34* . . . (n- 1 ) added throughout. The corre¬ 
sponding equation for the correlations is obtained at once by writing down 
equation (14.17) for b 2} 3 , r (w _ 1} and taking the square root of the 
product; this gives : 


'12 34 . (n-1) 


2 34 n 4 r \n 23 (» \) r '*n 13 


(1 1 1n 23 ... (n - J) 

which is similarly the equation 


(n - 1 ) 

2n.l3 1>) 


(14.18) 


_ ^ 12.n 2?2n 1 

12 ~ (l-r^sKl -fin)" 1 

with the secondary suffixes 34 . . . (n -1) added throughout. 


Conditions of Consistence among Correlation Coefficients. 

14.25. Equations (14.13) and (14.18) imply that certain limiting 
inequalities must hold between the correlation coefficients in the expression 
on the right in each case in order that real values (values between ± 1 ) may 
be obtained for the correlation coefficient on the left. These inequalities 
correspond precisely with those “ conditions of consistence between class- 
frequencies with which we dealt in Chapter 2 , but we propose to treat them 
only briefly here. Writing (1 1.13) in ils simplest form for r 12 3 , we must 
have r \ 3 3 < 1 or 

( r i 2 ~ r i 3 r 23 ) 2 ^ i 
(1 -r*,)( 1 - »■*,) 

that is, 

r \2 + r u+ r ls-' ir u r n r i°, > 1 • • • (14.19) 

if the three r’s are consistent with one another. If we take r 12 , r 13 as known, 
this gives as limits for r 23 , 

v s ± v , i-4-?i 3 +/;vf 3 

Similarly, writing (14.18) in its simplest form for r 12 in terms of r 12>3 , 
r 13t2 and r 23 l% we must have: 

r‘h:> + r u s 4 4u + 2r 12 3 r I3 „r 231 < 1 . . (14.20) 

and therefore, if r 12<3 and r 13t2 are given, r 23il must lie between the limits 

~~ r u.d r v3.i i Vl - r* u 3 - 3 -f- r \ 2 3 rJ 3 2 




PARTIAL CORRELATION. 


281 


The following table gives the limits of the third coefficient, in a few 
special cases, for the three coefficients of zero order and of the first order 
respectively :— 


Value of 

Limits of 

r i 2 or /us 

f!3 Or 7*13 2 

t*23 

^23.1 

0 

0 

4-1 

41 

+ 1 

4-1 

41 

-1 

±1 

+ 1 

-1 

41 

± Vo-5 

±V "o * r > 

0, 41 

0, -1 

±V0-5 

4 * 

0, -1 

0, 41 


The student should notice that the set of three coefficients of order zero 
and value unity arc only consistent if either one only, or all three, are 
positive, i.e. 4*1, 4-1, 4 - 1 , or -1, -1, 4-1 ; but not -1, -1, -1. On the 
other hand, the set of three coefficients of the first order and value unity 
are only consistent if one only, or all three, are negative : the only con¬ 
sistent sets are 4 1, 4-1, -1 and -1, -1, -1. The values of the two 
given r’s need to be very high if even the sign of the third can be inferred ; 
if the two are equal, they must be at least equal to V()*5 or 0-707 . . . 
Finally, it may be noted that no two values for the known coefficients ever 
permit an inference of the value zero for the third ; the fact that 1 and 2, 
1 and 3 arc uneorrclated, pair and pair, permits no inference of any kind 
as to the correlation between 2 and 3, which may lie anywhere between 
41 and -1. 


Fallacies in the Interpretation of Correlation Coefficients. 

14.26. We do not think it necessary to add to this chapter a detailed 
discussion of the nature of fallacies on which the theory of multiple correla¬ 
tion throws much light. The general nature of such fallacies is the same 
as for the ease of attributes, and was discussed fully in Chapter 4. It 
suffices to point out the principal sources of fallacy which are suggested 
at once by the form of the partial correlation 


r iz 3 — 


_ r i2~ _ 

V(I -4,)(i-4) 


(«) 


and from the form of the corresponding expression for r 12 in terms of the 
partial coefficients: 

r r l2.3 + r l3.2 r 23.1 _ v 

12 

From the form of the numerator of (a) it is evident (1) that even if r 12 be 
zero, r 12>3 will not be zero unless either r ]3 or r 23 , or both, are zero. If r 13 
and r 23 are of the same sign, the partial correlation will be negative ; if of 
opposite sign, positive. Thus the quantity of a crop might appear to be 
unaffected, say, by the amount of rainfall during some period preceding 
harvest : this might be due merely to a correlation between rain and 
low temperature, the partial correlation between crop and rainfall being 




282 


THEORY OF STATISTICS. 


positive and important. We may thus easily misinterpret a coefficient of 
correlation which is zero. (2) r 12 3 may be, indeed often is, of opposite 
sign to r 12 , and this may lead to still more serious errors of interpretation. 

From the form of the numerator of ( b ), on the other hand, we see that, 
conversely, r n will not be zero even though r J23 is zero, unless either 
r 13t2 or r 231 is zero. This corresponds to the theorem of 4.12, and indicates 
a source of fallacies similar to those there discussed. 

14.27. We have seen that r 12 3 is the correlation between t r 13 and x 2Z , 
and that we might determine the value of this partial correlation by drawing 
up the actual correlation table for the tw r o residuals in question. Suppose, 
however, that instead of drawing up a single table w e drew up a series of 
tables for values of »r 1<3 and x 2 3 associated with values of aq lying within 
successive class-intervals of its range. In general, the value of r l2 3 would 
not be the same (or approximately the same) for nil such tables, but would 
exhibit some systematic change as the value of t r 3 increased. Hence r 123 
should be regarded, in general, as of the nature of an average correlation: 
the cases in which it measures the correlation between ,r x 3 and x 2 3 for 
every value of <r 3 (cf. below, 14.31) are probably exceptional. The process 
for determining partial associations (cf. Chapter 4) is, it will be remembered, 
thorough and complete, as wc always obtain the actual tables exhibiting 
the association between, say, A and li in the uni\ erse of (°s and the universe 
of y’s : that tw r o such associations may differ materially is illustrated by 
Example 4.1, page 52. It might sometimes serve as a useful cheek on 
partial correlation work to reclassify the observations by the fundamental 
methods of Chapter 4. For the general ease an extension of the method 
of the “ correlation ratio ” (13.5) might be useful, though exceedingly 
laborious. 


Multivariate Normal Correlation. 


14,28. The theorems and results of Chapter 12 in regard to normal 
correlation can be extended to tlie ease of n variates, which we have studied 
in this chapter. 

In fact, suppose w r e have n variates x l9 a%, <r 3 , . . . x n , measured from 
their respective means, with standard deviations cq, < 7 2 , cq, . . . a tl . Let 
us first consider the simple ease in which they are normally distributed 
and each is completely independent of the others. 

Then, if y t . n denote the frequency of the combination of deviations 
x l9 x& . . . x n9 we have: 


where 


2/12 . . 

<f>{* 1 , • • 


= 2/12 


£> - 140 * 1 , Ti, . . . ar«) 


v ax" t 
• *») = :: 2 +. 


(T 2 


2 + 


.r,, 2 


(14.21) 


Now consider the variates x 2l9 x ;l 12 , . . . x tl l2 . . . (w _,>. Whether 
aq, <r 2 , . . . x n are correlated or not, these variates are uncorrelated, 
in virtue of 14.10. Let us further suppose they are independent and 
normally distributed. Then their distribution is given by 


t/j 2 n ^y'i 2 . , n e~ . • • x n . n . ,. <*-n) . (14.22) 



PARTIAL CORRELATION. 


283 


<f>(X v # 2 . 1 , • • • (?n 12 . . (/i-l))~~2~* ~sT + * * * + 2 (14.23) 

a l ^ai cr nl2 . . . («-l) 

and 

ffl*...,,-—i-—-- (1^.24) 

(2n) > a 1 (72 1 * * • ®n 12 (n-1) 

The expression (14.23) may be put in a more convenient form. It may 
be shown, but we omit the proof, that 


(14.23) 


(14.25) 


T n U . (n - 1) 


°\ 2i 7i °2 1 ( 


' (« lb/ 12 <// 2) 


®n~ 1 1 . (n-2)n cr r? 1 . . . (n-1) 


which exhibits flu form as symmetrica! m 
Now, we showed in 14.13 that 


In precisely the same* wav it may be shown that 


' I 21 7/ u 2 1 \ 


CT i (Tn 

r,) u 


(» J2 being the minor m o> of the term in the first row and the second 
column. 

If we substitute thes/ and analogous \ alues in (1 1.22), we get : 


(277)~(7j (T> . . . <T fl V«) 


f • • • + a , C + • • • +2a> ™-C n l"~'} (142C) 

l (Ji 0% OjCTg ® n® n - 1/ 

This is a form which is very frequently quoted. 

14.29. From these formula' several important results follow immedi¬ 
ately. 

In the first place, for any fix<d values h 2 . . . h n of # 2 . . . # n , the 
exponent (14.25) becomes: 


a l 21 u a 2U 


In 2 . . . 

i constant terms 

7 1 23 n a n 1 . (n ~ 1) 


_ ? 123 . rfi'2 

22 . n 13 , . fl 


' In 2 (n- 1 )'* 


H constant terms 


(n-1) 



284 


THEORY OF STATISTICS* 


Hence x x is distiibuted normally about the mean, m l9 given by 

— . . . + r J«-t^J«rD ha . (14.27) 

^1.28 . . . n 13 . . n a n. 1 ... in 1) 

Hence every array of every order is normally distributed. 

It follows in a similar way that any linear function of the rr’s is 
distributed normally. 

In particular, all deviations of any order and with any number of 
suffixes are normally distributed. 

14.30. Secondly, as will be seen from (14.27), the regression of x x on 

the other variables is linear. It follows that the regression of any variate 
on any or all of the others is linear. In (14.27), for instance, the ex¬ 
pressions - 12 3 ' vl<Ji 23 -’ -, etc., are the partial regressions b l2 n , etc. 

. . . n 

14.31. Finally if, in equation (14.23), any fixed values be assigned to 
& 312 and all the following deviations, the correlation between x x and x 2 , on 
expanding x 2 l9 is, as we have seen, normal correlation. Similarly, if any 
fixed values be assigned to ot l9 to x A 123 , and all the following deviations, on 
reducing <r 312 to the second order we shall find that the correlation between 
x 2 i and x 3 x is normal correlation, the correlation coefficient being r 2l x , and 
so on. That is to say, using k to denote any group of secondary suffixes, (1) 
the correlation between any tzvo deviations x m k and is normal correlation ; 
(2) the correlation between the said deviations is r mn A whatever the particular 
fixed values assigned to the remaining deviations . The latter conclusion, it 
will be seen, renders the meaning of partial correlation coefficients much 
more definite in the case of normal correlation than in the general case. In 
the general case r mn k represents merely the average correlation, so to speak, 
between x m J: and x n , : in the normal case r mn A is constant for all the sub¬ 
groups corresponding to particular assigned values of the other variables. 
Thus in the case of three variables which are normally correlated, if we 
assign any given value to x 3 , the correlation between the associated values 
of x x and x 2 is r 12 3 : in the general case r 12<3 , if actually worked out for the 
various sub-groups corresponding, say, to increasing values of x 3 , would 
probably exhibit some continuous change, increasing or decreasing as the 
case might be. 


SUMMARY. 

1. The regression equation of x } on x 2l . . . x n is written : 

ai l == ^12 34 . . . n x 2 + b 13 24 . . n x 3+ • • • ^ln 21 . . («-l) ir n 

The deviation .r, 23 . n is defined as 

X l ~~ ^12 34 . . . n x 2 ”“^13 24 n x .\ ~ • • • “* ^hn 23 . . . ( n~-l) x n 

and cr x 23 ... w i s the standard deviation of a? 1>a3 , # # n . 

2. The equations giving the regression coefficients are: 




PARTIAL CORRELATION. 


285 


S(# 2 #i 23 . . . n) 

S(# 3 ;Ti 23 , . . tj) ~0 

23 . . . «)=-<> 

and similar equations with u m w , etc. 

8. The product-sum of any two deviations is unaltered by omitting any 
or all of the secondary subscripts of either which are common to the two ; 
conversely, the product-sum of any deviation of order p with a deviation 
of order p + q, the p subscripts being the same in each case, is unaltered by 
adding to the secondary subscripts of the former any or all of the q 
additional subscripts of the latter. 

a h ?1 34 . • » 

*• *'12 34 n“'l2.S4 . . . « 

°2 34 n 

5. Any standard deviation of order p can be expressed in terms of a 
standard deviation of order p - 1 and a correlation of order p -1. In fact, 

°1.23 . . . n “ °\ 23 . . . (n- J)( * ~ r ln 23 . . . (n-1)) 


0. 

where a> is the determinant 


7 2 

J v 23 ... n 


WOy 1 


I 1 '"l2 r ll • • • 'in I 

I '21 1 »*1 • • • < 1 „ ! 

I 

1 7 ?it 7 n2 ^n.l • • • 1 


and a> PJ) is the minor of the element in the plh row and the />th column. 

7. Any regression of order p may be expressed in terms of regressions 
of order p - 1. In fact, 


.^12 34 (« I)_^lfl34 (w-l)^«2 34 (n-p 

I— ^2n 34 . (n-l)^«2 34. . (n-1) 


8. Similarly, for correlations : 


?1 <> 


12 34 


_ ^*12 14 : (/J-l) 7 *lrt.l4 . - («-l)^2u »4 . . • hi 1) 

0 “ r ln.3i ...(/» 1)) U . . . (n -1))' 


t). The eoellicient of multiple correlation /f J(23 , , w) is given by 

a 1.23 . . . n ~ °l0 “ ^1(23 . . . n)) 


or 


~ 1 - ^1(23 . . . n) 


Also, 

1 ^1(23 . . . n>“0 “ 7 i!?)0 ~ 7 la i>)(^ ~ 7 11 2 . 1 ) • • • (1 " 7 hi 2 . ... <n i>) 


10. It is necessarily not less than zero. If it is zero, the variate to 
which it refers is completely uncorrelated with the other variates. If 
JR =■!, there is a linear relation between the variates. 




280 


THEORY OF STATISTICS. 


11. The multivariate normal surface may be written: 

v --- N e~i+ 

(27T) 2 a 1 a> . . . a t( Voj 


where 


i 1/ 


cc ^ 

+ 0) 2,2~\ + 
°2 




iT n cT r , j \ 


EXERCISES. 

14.1. (Ref. (299).) The following menus, standard deviations and correla¬ 
tions are found for 

X 1 - Seed-hay crops iri owls, per acre, 

A ' 2 - Spring rainfall in inches, 

A '3 —Accumulated temperature above f 2 F. in spimg, 
in a certain district of England during twenty \ ears. 


Mi - 

- 28 02 

<X,“ 

4 42 

r 1£ \ 0 80 

M« - 

- 4-91 

<t 2 = 

1 10 

r,, - 0 h) 

M 3 

-594 


S5 

Jjjj * 0 50 


Find the partial correlations and the regression equation for hay-crop on spring 
rainfall and accumulated temperature. 

14.2. In Exercise 14.1, find the multiple correlation coeflicient of each variate 
on the pther two. 

14.3. (The following figures must he taken as an illustration only: the data 
on which they were based do not refer to uniform times or areas.) 

A\ = Deaths of infants under 1 year per 1000 births in same year (infantile 
mortality). 

A " 2 “Number per thousand of married women occupied for gain. 

A r s - Death-rate of persons over 5 years of age per 10 , 000 . 

X t —Number per thousand of population living two or more to a room 
(overcrowding). 

Taking the figures below for thirty urban areas in England and Wales, find 
the partial correlations and the regression equation for infantile mortality on 


the other factors. 

-=1C4 

rq- 

20*0 

r,, i 0 19 

a 

i 0 15 

IW 2 = 158 

(T 2 

71*9 

r,, -! 0 78 

r 24 

0 37 

M„ =- 1 -I S 

** 

22*4 

Ui +0-20 

T Vl 

i 0*23 

M, = 205 


130*0 




14.4. In Exercise 14.3, find the multiple correlation coefficient of X x on A r 2 
and X a ; and of X x on the other three variates. 

14.5. (Data from W. F. Ogburn, “Factors in the Variation of Crime among 
Cities,” Jour . Amcr. Siat. Assoc., vol. 30, 1935, pp. 12-31.) 

For certain large cities in the U.S.A. : 

A r x Crime rate, being the number of known olfenees per thousand of 
population. 

X % — Percentage of male inhabitants. 

A 3 = Percentage of total inhabitants who are foreign-born males. 



PARTIAL CORRELATION. ‘ 287 


X t -Number of children under 5 years of age per thousand married women 
between 15 and 44 years of age. 

X 5 = Church membership, being number of church members 18 years of 
age and over per 100 of total population 13 years of age and over. 


M x 19-9 
Af a -- 49*2 

Mj- 10*2 

M 4 -481*4 
M s = 410 


cTjl = 7*9 r 12 -10*41 

<r 2 ~ 1*8 r la =-0*34 

<7 a — 1*0 r u = -0 31 

74 1 r 15 =~0*ll 

rr 5 - 10 8 r 23 - 4 0*25 


r a4 = -0*19 
^*25~ 0*35 
r u = -f 0*44 
r 35 - 4 0*33 
r *(> — + 0 85 


Find the regression equation of Xj on the other four variables. Find also 

f?H2316)» 

Find, further. r 16 „ r u 4 and r jr> 31 . Discuss the influence of church membership 
on crime for these data. 

14.0. Show that for n variates there arc f, C 2 total correlation coeflieients, 
(n ~2) n C 2 correlation coefficients of order 1, ,l ’" 8 ( 1 2 n ( , l correlation coefficients of 
order 2, and n •U/’Ca of order <?. Hence show that there are n(n -1)2 r ‘ 3 
correlation coefficients and n(n -l)2 n “ 2 regression coeflieients. 

1 1.7. Find the number of multiple correlation coefficients of order s and the 
total number of such coefficients lor n variables. 

14.8. If all the correlations of order zero are equal, say ~ r, what are the values 
of the paitial correlations of successive orders? 

Under the same conditions, what is the limiting value of r if all the equal 
emulations are negative and n variables ha\e been observed? 

1441. Write down from inspection the values of the partial correlations for the 
three variables 

A r ,, A' 2 and A’, ~aX l \-bX 2 

11.10. If the relation 

a+r ] } hr 2 i rvr 3 =0 

holds for all sets of values of x 19 and a 7 a , what must the partial correlations 
be? 



CHAPTER 15. 


CORRELATION: ILLUSTRATIONS AND PRACTICAL 

METHODS. 

15.1. The student—espeeially the student of eeonomic statistics, to 
whom this chapter is principally addressed should be earetul to note that 
the eoefheient of correlation, like an average or a measure of dispersion, 
only exhibits in a summary and comprehensible form one particular aspect 
of the facts on which it is based, and the real difficulties arise in the inter¬ 
pretation of the coefficient when obtained. The value of the coefficient 
may be consistent with some given hypothesis, but it may be equally 
consistent with others ; and not only are care and judgment essential for 
the discussion of such possible hypotheses, but also a thorough knowledge 
of the facts in all other possible aspects. Further, care should be exercised 
from the commencement in the selection of the variables between which 1 he 
correlation shall be determined. The variables should be defined m such a 
way as to render the correlations as readily interpretable as possible, and, 
if several are to be dealt with, they should afford the answers to specific and 
definite questions. Unfortunately, the held of choice is frequently very 
much limited, by deficiencies in the available data and so forth, and con¬ 
sequently practical possibilities as well as ideal requirements have to be 
taken into account. No general rules can be laid down, but the following 
are given as illustrations of the sort of points that have to be considered. 

15.2. Example 15.1. —It is required to throw some light on the 
variations of pauperism in the unions (unions of parishes) of England. 
(Cf. Yule, ref. (834)—investigation carried out in 1808.) 

On the whole, it would seem best to correlate changes in pauperism with 
changes in various possible factors. If we say that a high rate of pauperism 
in some district is due to lax administration, we presumably mean that 
as administration became lav, pauperism rose, or that if administration 
were more strict, pauperism would decrease; if we say that the high 
pauperism is due to the depressed condition of industry, we mean that 
when industry recovers pauperism will fall. When we say, m fact, that 
any one variable is a factor of pauperism, wc mean that changes in that 
variable are accompanied by changes in the percentage of the population in 
receipt of relief, either in the same or the reverse direction. It wall be 
better, therefore, to deal with changes in pauperism and possible factors. 
The next question is what factors to choose. 

15.3. The possible factors may be grouped under three heads : 

(a) Administration. —Changes in the method or strictness of administra- 
tion of the law. 

(b) Environment. —Changes in economic conditions (wages, prices, 
employment), social conditions (residential or industrial character of the 

288 



CORRELATION: PRACTICAL METHODS, ' 28 $ 

district, density of population, nationality of population) or moral con¬ 
ditions (as illustrated, e.g., by the statistics of crime). 

(c) Age Distribution .—The percentage of the population between given 
age-limits in receipt of relief increases very rapidly with old age, the actual 
figures given by one of the only two then existing returns of the age of 
paupers being: 2 per cent, under age 10, 1 per cent, over 10 but under 65, 
20 per cent, over 05. (Return 30, 1890.) 

It is practically impossible to deal with more than three factors, one 
from each of the above groups, or four variables altogether, including the 
pauperism itself. What shall we lake, then, as representative variables, 
and how shall we best measure “ pauperism ” ? 

15.4. Pauperism. -The returns give (a) cost, (b) numbers relieved. 
It seems better to deal with (b), as numbers are more important than cost 
from the standpoint of the moral effect of relief on the population. The 
returns, however, generally include both lunatics and vagrants in the totals 
of persons relit \ ed ; and as the administrative methods of dealing with these 
twoelasscsdiflerentircly from the methods applicable to ordinary pauperism, 
it seems better to alter the official total by excluding them. Returns are 
available giving the numbers m receipt of relief on 1st January and 1st July; 
there does not seem to be ativ special reason for taking the one return 
rather than the other, but the return for 1st January was actually used. 
The percentage of the population in receipt of relief on 1st January 1871, 
1881 and 1891 (the three census years), less lunatics and vagrants, was 
therefore tabulated for each union. 

15.5. Adminushat ton. The most important point here, and one that 
lends itself readily to statistical treatment, is the relative proportion of 
indoor and outdoor relief (relief in the workhouse and relief in the appli¬ 
cant's home). The lirst question is, again, shall we measure tins proportion 
by cost or by numbers ? The latter seems, as before, the simpler and more 
important ratio for the present purpose, though some writers have pre¬ 
ferred the statement m terms of cxpcndituie (e.g. Charles Booth, “Aged 
Poor — Condition , ISO I ”). If v\t decide on the statement in terms of 
numbers, we still have the choice of expressing the proportion (1) as the 
ratio oi numbers given out-relief to numbers m the workhouse, or (2) as 
the percentage of numbers given out-relict on the total number relieved. 
The former method was chosen, partly on the simple giound that it had 
already been used m an earlier investigation, partly on the ground that the 
use of the ratio separates the higher proportions of out-relief more clearly 
from each other, and these differences seem to have sigiuiicance. Thus a 
union with a ratio of 15 outdoor paupers to 1 indoor seems to be materi¬ 
ally different from one with a ratio of, say, 10 to 1 ; but if wc take, instead 
of the ratios, the percentages of outdoor to total paupers, the ligurcs arc 
9t per cent, and 91 per cent, respectively, which are so close that they will 
probably fall into the same array. The ratio of numbers in receipt of out¬ 
door relief to the numbers in the workhouse, in every union, was therefore 
tabulated for 1st January in the census years 1871, 1881, 1891. 

15.6. Environment .—This is the most difficult factor of all to deal 
with. In Booth’s work the factors tabulated were (1) persons per acre ; (2) 
percentage of population living two or more to a room, i.e. “overcrowding ”; 
(3) rateable value per head (“AgedPoor—Condition ’’). The data relating 
to overcrowding were first collected at the census of 1891, and are not 

19 



290 


THEORY OF STATISTICS. 


available for earlier years. Some trial was made of rateable value per head, 
but with not very satisfactory results. For any given year, and for a group 
of unions of somewhat similar character, e.g. rural, the rateable value per 
head appears to be highly (negatively) correlated with the pauperism, but 
changes in the two are not very highly correlated : probably the move¬ 
ments of assessments are sluggish and irregular, especially m the ease of 
falling assessments in rural unions, and do not correspond at all accurately 
with the real changes in the value of agricultural land. After some con¬ 
sideration, it was decided to use a very simple index to the changing 
fortunes of a district, v iz. the movement of the population itself. If the 
population of a district is increasing at a rate above the average, this is 
prim a facie evidence that its industries are prospering ; if the population 
is decreasing, or not increasing as fast as the average, this strongly suggests 
that the industries are suffering from a temporary lack of prosperity or 
permanent decay. The population of every union was therefore tabulated 
for the censuses* of 1871, 1881, 1891. 

15.7. Age Distribution . - As already stated, the figures that are known 
clearly indicate a very rapid rise of the percentage relieved after 05 years 
of age. The percentage of the population over 05 years of age was there¬ 
fore worked out for every union and tabulated from the same three censuses. 
This is not, of course, at all a complete index to the composition of the 
population as aifeetmg the rate of paupeiism, which is sensiblv dependent 
on the proportion of the two sexes, and the numbers of children as well. 
As the percentage m receipt of lclicf was, however, 20 per cent, for those 
over 05, and only 1 to 2 per cent, for those under that age, it is evidently a 
most important index. (A more complete method might have been used 
by correcting the observed rate of pauperism to the basis of a standard 
population with given numbers of each age and sex (ef. Chap. 10, pages 
805-800).) 

15.8. The changes in each of the four quantities that had been 
tabulated for every union were then measimd bv working out the ratios 
for the mtereensal decades 1871 81 and 1881 91, taking the value in the 
earlier year as 100 in eaeli ease. The percentage ratios so obtained were 
taken as the four variables. Further, as the conditions are and were very 
different for rural and for urban unions, it seemed very desirable to separate 
the unions into groups according to their character. But tins cannot be 
done with any exactness : the majority of unions are of a mixed character, 
consisting, say, of a small town with a consult rable extent of the surround¬ 
ing country. It might seem best to base 1 the classification on returns of 
occupations, e.g. the proportions of the population engaged in agriculture, 
but the statistics of occupations are not given m the census for individual 
unions. Finally, it was decided to use a classification by density of popula¬ 
tion, the grouping used being—Rural, 0-8 person per acre or less ; Mixed, 
more than 0*8 but not more than 1 person per acre ; Urban, more than 1 
person per acre. The metropolitan unions were also treated by themselves. 
The limit 0*8 for rural unions was suggested by the density of those agri¬ 
cultural unions the conditions in which were investigated by the Labour 
Commission which reported m 1891: the average density of these was 0*25, 
and 84 of the 88 were under 0*8. The lower limit of density for urban 
unions —1 per acre was suggested by a grouping of Booth’s (group xiv.) : 
of course 1 person per acre is not a density associated with an urban district 



CORRELATION: PRACTICAL METHODS 


291 


in the ordinary sense of the term, but a country district cannot reach tills 
density unless it includes a small town or portion of a town, i.c. unless a 
large proportion of its inhabitants live under urban conditions. 

15.9. Example If), 2. - The subject of investigation is the inheritance 
of fertility in man. ( Cf ‘ Pearson and others, ref. (i!2;i).) 

Fertility in man (i.c. 1 he number of children born to a given pair) is \ t ry 
largely inllueneed by the age of husband and wife at marriage (especially 
the latter), and by the duration of marriage. It is desired to lmd whether 
it is also inllueneed by the heritable constitution of t lie parents, i.c. whether, 
allowance being made for the effect of such disturbing causes as age and 
duration of marriage, fertility is itself a heritable character. 

The effect of duration of marriage may be largely eliminated by exclud¬ 
ing all marriages which have not lasted, say 15 years at least. This will 
rather heavily reduce the number of records available, but will leave a 
sufficient number for discussion. Jt would be desirable to eliminate the 
effect of late marriages in the same way by excluding all eases in which, 
say, husband was over t‘50 years of age or wife over 25 (or even less) at the 
time of marriage. But, unfortunately, this is impossible ; the age of the 
wife -the most important factor is onl\ exceptionally given in peerages, 
family histories and similar works, from which the data must be compiled. 
All marriages lasting J5 \ ears or more must therefore be included, whatever 
the age of the parents at marriage, and the effect of the varying age at 
marriage must be estimated afterwards. 

15.10. But the correlation between (t) number of children of a 
woman and (2) number of children of her daughter will be further affected 
according as we include in the record all her available daughters or only one. 
Suppose, e.g., the number of children in the iirst generation is 5 (say the 
mother and her brothers and sisters), ami the mother has three daughters 
with 0, 2 and t children respectively: are we to entei all three pairs (5, 0), 
(5, 2), (5, 4) m the correlation table*, or only one pair ? If the latter, which 
pair ? For theoretical simplicity the second process is distinctly the better 
(though it still further limits tlu* available data). If it be adopted, some 
regular rule will have to be made for the selection of the daughter whose 
fertility shall be entered in the table, so as to avoid bias : the first daughter 
married for whom data are given, and who fulfils the conditions as to 
duration of marriage, may, for instance, be taken in e\erv ease, (For a 
much more detailed discussion of the problem, and the allied problems 
regarding the inheritance of fertility in the horse, the student is referred to 
tlu* original.) 

15.11. Ej 1 ample 15.3. - The subject for investigation is the relation 
between the bulk of a crop (wheat and other cereals, turnips and other root 
crops, hay, etc.) and the weather. (C'f. Hooker, ref. (310).) 

Produce statistics for the more important crops of Great Britain have 
been issued by the Ministry of Agricult ure since 1885 : the figures arc based 
on estimates of the yield furnished by official local estimators all over the 
country. Estimates are published for separate counties and for groups of 
counties (divisions). The climatic conditions vary so much over the United 
Kingdom that it is best to deal with a limited area, homogeneous as far as 
possible from the meteorological standpoint. On the other hand, the area 
should not be too small; it should be large enough to present a representa¬ 
tive variety of soil. The group of eastern counties, consisting of Lincoln, 



292 


THEORY OF STATISTICS. 


Hunts, Cambridge, Norfolk, Suffolk, Essex, Bedford and Hertford, was 
selected as fulfilling these conditions. The group includes the county with 
the largest acreage of each of the ten crops investigated, with the single 
exception of permanent grass. 

15.12. The produce of a crop is dependent on the weather of a long 
preceding period, and it is naturally desired to find tlie influence of the 
weather at successive stages during this period, and to determine, for 
each crop, which period of the year is of most critical importance as regards 
weather. It must he remembered, howe\er, that the times of both sowing 
and harvest are themselves very largely dependent on the weal her, and 
consequently, on an average of many years, the limits of the critical period 
wall not be very well defined. If, therefore, we correlate the produce of the 
crop (A) with the characteristics of the weather (Y) during successive 
intervals of the year, it will be as well not to make these mterv als too short. 
It was accordingly decided to take successive groups of 8 weeks, overlap¬ 
ping each other by 4 weeks, i.r. weeks 1-8, 5 12, etc. Correlation coefficients 
were thus obtained at 4-week intervals, but based on 8 weeks' weather. 

15.13. It remains to be* decided what characteristics of the weather 
are to he taken into account. The rainfall is clearly one factor of great 
importance, temperature is another, and these two will afford quite enough 
labour for a first investigation. The weekly rainfalls were averaged for 
eight stations within the area, and the average 1 taken as the first character¬ 
istic of the weather. Temperatuies were taken from the records of the 
same stations. Tin* average temperatures, however, do not give quite the 
sort of information that is required : at temperatures below a certain limit 
(about 42° Fahr.) there is v cry lit tit' growth, and the growth increases in 
rapidity as the temperature rises above this point (within limits). It w r as 
therefore decided to utilise the figures for “ accumulated temperatures 
above t2° Fahr.,” Le. the total number of da}-degrees above t2° during 
each of the 8-weeklv periods, as the second characteristic of the weather ; 
these “ accumulated temperatures,” moreov er, show r much larger variations 
than mean temperatures. 

Tlie student should refer to the original for the full discussion as to data. 
The Variate-difference Correlation Method. 

15.14. Problems of a somewhat special kind arise when dealing witli 
the relations between simultaneous values of two variables which have been 
observed during a considerable period of time, for l he more rapid move¬ 
ments will often exhibit a fairly close consilience, while the slower changes 
show no similarity. The two following examples will serve as illustrations 
of two methods which are generally applicable to such eases: - 

Example 15,4. —Fig. 15.1 exhibits the movements of (1) the infantile 
mortality (deaths of infants under 1 year of age per 1000 births in the same 
year), (2) the general mortality (deaths at all ages per 1000 living), in 
England and Wales during the period 1888-1914. A very cursory in¬ 
spection of the figure shows that when the infantile mortality rose from 
one year to the next the general mortality also rose, as a rule ; and similarly, 
when the infantile mortality fell, the general mortality also fell. There 
were, in fact, only seven or eight exceptions to this rule during the whole 
period under review. Tlie correlation between the annual values of the 
two mortalities would nevertheless not be very high, as the general mortality 



1845 2855 2865 2875 1885 2895 2905 MS 


correlation: practical METHODS,' 


2 ! 



8 


Sasuns jaddn) snnjso nnm jad frnmotecnii ammviUT T 


2845 1855 1865 1875 1885 2895 1905 1915 

Years 



294 


THEORY OF STATISTICS. 


has been falling more or less steadily since 1875 or thereabouts, while the 
infantile mortality attained almost a record value in 1808. During a long 
period of time the correlation between annual values may, indeed, very well 
vanish, for the two mortalities are affected by causes which are to a large 
extent different in the two cases. To exhibit, therefore, the closeness of 
the relation between infantile and general mortality, for such causes as show 
marked changes between one year and the next , it will be best to proceed by 
correlating the annual changes , and not the annual values. The work 
would be arranged in the following form (only sufficient years being given 
to exhibit the principle of the process), and the correlation worked out 
between the figures of columns 3 and 5 : — 


1. 

2. 

S. 

4 . 

6. 


Infantile 

Increase or 

General 

Increase or 

Year. 

Mortality per 

Decrease from 

Moitahty per 

Decrease liom 


1000 Births. 

Year before. 

1000 living. 

Year before 

1838 

169 


22 4 


1839 

161 

-8 

21*8 

-0*6 

1840 

154 

+ 3 

22 9 

-fl’l 

1841 

146 

-9 

21 6 

-1*3 

1842 

162 

+7 

21 7 

+0*1 

1843 

160 

- 2 

212 

-0*6 


For the period to which the diagram refers, viz. 1838 11)14, the follow¬ 
ing constants were found by this method : * 

Infantile mortahtv, mean annual change - 0-71 
„ ,, , standard deviation 10*70 

General mortality, mean annual change - (Ml 
,, „ , standard deviation 1*13 

Coefficient of correlation \ 0-00 

This is a much higher correlation than would arise from the mere fact 
that the deaths of infants form part of the general mortality, and con¬ 
sequently there must be a high correlation between the annual changes m 
the mortality of those who are over and under 1 year of ago, respectively. 
(Cf. Exercise 10.0, page 308.) 

15.15. The procedure of tlie foregoing section has been called the 
44 variatc-differenee correlation method.” By taking first differences 
instead of the variate values themselves, the slower changes of the two 
variates with time are to some extent eliminated, and we are able to study 
the effect of short-term variations. To eliminate the secular changes more 
completely it may be desirable to proceed to second differences, i.e . to work 
out the successive differences of the differences in column 3 and column 5 
before correlating. Tt may even he desirable to proceed to third, fourth 
or higher differences before correlating. The method should, however, he 
used with caution in such eases, particularly with short series. Correlation 
coefficients obtained from higher differences are not always reliable, and 
their interpretation becomes a matter of considerable difficulty. 

15.16. Example 15.5. -The two curves of fig. 15.2 show (1) the 
marriage-rate (persons married per 1000 of the population) for England and 
Wales ; (2) the values of exports and imports per head of the population 
of the United Kingdom for every year from 1855 to 1901. Inspection of 
the diagram suggests a similar relation to that of the last example, the one 




CORRELATION: PRACTICAL METHODS. 1 295 

variable showing a rise from one year to the next when the other rises, and 
a fall when the other falls. The movement of both variables is, however, 
of a much more regular kind than that of mortality, resembling a series of 
“ waves ” superposed on a steady general trend, and it is the “ waves ” in 
the two variables—the short-period movements, not the slower trends— 
which are so clearly related. 

15.17. It is not difficult, inoreo\er, to separate the short-period 
oscillations, more or less approximately, from the slower movement. 



Suppose the mairuigc-rate for each > ear replaced by the a\erage of an odd 
number of yeais of which it is the centre, the number bring as near as may 
be the same as the period of the “ wa\ es ” e.g. nine years. If these short- 
period averages were plotted on tlu* diagram instead of the rates of the 
mdi\ idual years, we should evidently obtain a smoother curve which would 
clearly exhibit the trend and be jmutieallv free from the conspicuous wa\ es. 
The excess or defect of each annual late abo\ c or below the trend, if plotted 
separately, would therefore gi\e the “waves” apart from the slower 
changes. The figures for foreign trade may be treated m the same way as 
the marriage-rate, and we can accordingly work out the correlation between 
the* waves or rapid fluctuations, undisturbed by the movements of longer 
period, however great thev may be. The arithmetic may be (‘juried out 
m the form of the following table, and the correlation worked out in the 
ordinary way between the figures of columns l and 7 : 


1. 

Year 

2. 

Marriage-rate 
(hnglantl 
and Wales) 

8. 

Nine 

Years’ 

Average 

4. 

Differ 

ence 

6 

Exports f Im- 
portB, £ s pt r 
bead (UK) 

6 

.Nine 

Yeais 

Average 

7 

Differ¬ 

ence. 

1865 

10*2 


_ 

9 SO 

_ 

_ 

1866 

10 7 

— 

— 

11 14 

— 

— 

1867 

16 5 

— 

— 

11 85 

— 

— 

1858 

16 0 

— 

— 

10 78 

— 

— 

1859 

17 0 

16 5 

+ 0 5 

11 72 

12 15 

-0 43 

1860 

17 1 

10*0 

+0 5 

U 03 

12 94 

+0 09 

1861 

16*3 

10 7 

-0 4 

13 01 

lo >2 

- 0 51 

1862 

10 1 

10 8 

-07 

13 40 

14 17 

-0 ,7 

1863 

16 8 

16*9 

-o-i 

15 13 

14 01 

+ 0 82 

1864 

17 2 

— 

— 

16 43 

—. 

— 

1866 

17 6 

— 

— 

16 37 

— i 

— 

1866 

17 5 

—- 

— 

17 72 | 

— 

— 

1867 

10 5 

— 


16 47 

i 








THEORY OF STATISTICS. 


296 

15.18. Fig. 15.8 is drawn from the figures of columns 4 and 7, and 
shows very well how closely the oscillations of the marriage-rate are related 
to those of trade. For the period 1861-95 the correlation between the two 
oscillations (Hooker, ref. (814)) is 0*80. The method may obviously be 
extended by correlating the deviation of the marriage-rate in any one year 
with the deviation of the exports and imports of the year before, or two 



I860 65 7 0 75 80 85 90 95 


Fig. 15.3.— Fluctuations in (1) Marnugc-iate and (2) Foreign Trade (Exports + Imports 
per head) in England and Wales: the Cuivcs show Deviations Irom 9-yeur Means. 
(Data of It. H. Hooker, Jour. Hoy. Stat. Soc., 1901.) 

years before, instead of the same year ; if a sufficient number of years be 
taken, an estimate may be made, by interpolation, of the time-difference 
that would make the correlation a maximum if it were possible to obtain 
the figures for exports and imports for periods other than calendar years. 
Thus Hooker found (ref. (814)) that on an average of the years 1861 -95 
the correlation would be a maximum between the marriage-rate and the 
foreign trade of about one-third of a year earlier. The method is an 
extremely useful one, and is obviously applicable to any similar ease. 
Reference may be made to ref. (885), in which several diagrams are 
given similar to fig. 15.8, and the nature of the relationship between the 
marriage-rate and such factors as trade, unemployment, etc., is discussed, 
it being suggested that the relation is even more complex than appears 
from the above. 



CHAPTER 16 . 


MISCELLANEOUS THEOREMS INVOLVING THE USE 
OF THE CORRELATION COEFFICIENT. 

Algebraical Convenience of the Correlation Coefficient. 

16.1. It has already been pointed out that a statistical measure, if 
it is to be widely useful, should lend itself readily to algebraical treatment. 
The arithmetic mean and the standard deviation derive their importance 
largely from the fact that they fulfil this requirement better than any other 
averages or measures of dispersion ; and the following illustrations, while 
giving a number of results that are of \alue in one branch or another 
of statistical work, suffice to show that the correlation coefficient can be 
treated with the same facility. This might indeed be expected, seeing 
that the coefficient is derived, like the mean and standard deviation, by a 
straightforward process of summation. 

The Standard Deviation of the Sum or Difference of Variables. 

16.2. Let *Yj, A 2 be two variables, and Z stand for their sum or 
difference. 

Let ,r lt a 2 denote delations of the several variables from their 
arithmetic means. Then, if 

Z ~X x 4 x 2 

evidently 

± ^2 

Squaring both sides of the equation and summing, 

S( 2 *)~S(.V) i S(.r 2 2 ) i 2S(,ivr a ) 

'That is, if r be the correlation bctvrcn oc x and a" 2 , and ex, a J9 cr 2 the respective 
standard deviations, 

cr 2 =- 04 2 4 ct 2 2 i_ 2r<y 1 <7 a • • . (16.1) 

If and «r 2 are uncorrclatcd, we have tiie important special case 

cr 2 — 04 2 4 a 2 2 .... (16.2) 

The student should notice that in this ease the standard deviation of 
the sum of corresponding values of the two variables is the same as the 
standard deviation of their difference. 

The same process will evidently give the standard deviation of a linear 
function of any number of variables. For the sum of a series of variables 
A„ X 2 , . . . A T ,y, we must have: 

cr 2 - cr x 2 4 a 2 2 4 . , . 4 ovy 2 4 2 r 12 c 7 1 cr 2 4-2r 13 a 1 cr 8 
+ •-•■* 2 r 23 o- 2 cr, 4 . . . 

297 



298 THEORY OF STATISTICS. 

r n being the correlation between X x and X 2 , r 2A the correlation between 
X 2 and X 3 , and so on. 


Influence of Errors of Observation on the Standard Deviation. 

16.3. The results of 16.2 may be applied to the theory of errors of 
observation. Let us suppose that, if any value of X be observed a large 
number of times, the arithmetic mean of t he observations is approximately 
the true value, the arithmetic mean error being zero. Then, the arithmetic 
mean error being zero for all values of A”, the error, say 8, is uneorrelated 
with X . In this ease if .r 1 be an observed deviation from the arithmetic 
mean, and x the true deviation, we have from the preceding : 

°ri =- a l + <*l • • • • ( 16 - S ) 


The effect of errors of observation is, consequently, to increase the standard 
deviation above its true value. The student should notice that the 
assumption made does not imply the complete independence of X and 8 : he 
is quite at liberty to suppose that errors fluctuate more, for example, with 
large than with small values of X , as might very probably happen. In 
that ease the contingency coefficient between X and 8 would not be zero, 
although the correlation coefficient might still vanish as supposed. 

16.4. If certain observations be repeated so that we have m every 
ease two measures x x and x 2 of the same deviation a\ it is possible to obtain 
the true standard deviation a x if the further assumption is legitimate that 
the errors Si and 8 2 are uneorrelated with each other. On this assumption 


and accordingly 


S0r r r 2 ) S(ir h8,)(.r t S 2 ) 


S(avr 2 ) 

N 


(16.4) 


(This formula is part of Spearman’s formula for the correction of the 
correlation coefficient; <f. 16.6.) 


Influence of Errors of Observation on the Correlation Coefficient. 

16.5. I ,et x ly y 1 be the observed deviations from the arithmetic means, 
x, y the true deviations, and 8, e the errors of observation. Of the four 
quantities a\ y , 8, e we will suppose x and ?/ alone to be correlated. On this 
assumption 

s (- r i.'/i) S(,Tf/) .... (16.5) 

It follows at once that 


and consequently the observed correlation is less than the true correlation. 
This difference, it should be noticed, no mere increase in the number of 
observations can m any way lessen. 



CORRELATION; MISCELLANEOUS THEOREMS. 


299 


Spearman’s Theorems. 

16.6. If, however, the observations of both x and y be repeated, as 
assumed in 16.4, so that we have two measures x 1 and x 2 , y L and y 2 of every 
value of x and y , I he true value of the correlation can be obtained by the 
use of equations (16.4) and (16.5), on assumptions similar to those made 
above. For we have: 

a S( l r,i/,)S( l r 2 ?y,) S(^ 1 // 2 )S( I r 2 // 1 ) 

S(®i»’a)S(?/i !h) S(.r r r ;! ) s (;/i//,) 


v r r r 

r iVl 22^/2 _ HVi W/i 

T T T Y 

■nn' Viy$ ’ jr\r z ’ W1V2 


(16.6) 


Or, if we use all the four possible correlations between observed values of 
x and obsen ed values of ?/, 


i 


i 

r f 




(16.7) 


Equation (16.7) is the original form in which Spearman gave his 
correction formula (refs. (339) and (340)). It will be seen to imply the 
assumption that, of the six quantities ,r, ?/, 8j, 8 2 , cq, c 2 , only x and y are 
correlated. The correction given by the second part of equation (16.6), 
also suggested by Spearman, seems, on the whole, to be safer, for it 
eliminates the assumption that the errors in x and in //, in the same series 
of observations, are uneorrelated. An insufficient though partial test of 
the correctness of the assumptions may bo made by correlating -x 2 with 
//, ~y 2 : this correlation should vanish. E\identlv, however, it may 
\amsh from symmetry without thereby implying that all the correlations 
of tlie errors are zero. 


Mean and Standard Deviation of an Index. 

16.7. The means and standard donations of non-linear functions of 
two or more variables can in general only be expressed in t erms of the means 
and standard deviations of the original 'v ariables to a first approximation, 
on the assumption that deviations are small compared with the mean values 
of the variables. Thus, let it be required to find the mean and standard 
deviation of a ratio or index Z ~X 1 jX 2 * in terms of the constants for A\ and 
X 2 . Let I be the mean of Z, M x and l/ 2 the means of A\ and A 2 . Then, 


.'si*' 

N X 


' > A 


Ml 
N M, 


S' 1 


M x 


1 t- 


x\ 

aL 


Expand the second bracket by the binomial theorem, assuming that 
xJM 2 is so small that powers higher than the second can be neglected. 
Then, to this approximation. 


1 M x 

N 1\1 2 




That is, if r be the correlation between^ and <&>, and if 7? L cq/J/j, v, ~ cr 2 /df 2 . 




M 2 


(1 - rv t v 2 t~v 2 2 ) 


(16.8) 



800 


THEORY OF STATISTICS. 


If ,v be the standard deviation of Z, we have : 


**+1* = 


2 


1 s(^ 1 ) 

N'Xj 

1 M,* ( 

' N M/' 


1 H 


M, 




Expanding the second bracket again by the binomial theorem, and neglect¬ 
ing terms of all orders above the second : 


.« + /2 _ 1 M ^(i , lT > Y(i - 2 T * + 3 J, * # ) 
1 ~ N MS ' 1 MS V i M, MS' 


M x 


MS’ 


or from (10.8): 


M 2 

= ^ 1 2 ( 1 -f Cj 2 - 4 8fl a a ) 

]\f 2 


(10.9) 


Correlation between Indices. 

16.8. The following problem affords a further illustration of the use of 
the same method. Required to find approximately the correlation between 
two ratios Z x —XJX& Z 2 ~ A r 2 /X„ X lf X 2 and X 3 being uncorrelated . 

Let the means of the two ratios or indices be I l9 / 2 , and the standard 
deviations s v s 2 ; these are given approximately by (10.8) and (10 9) of 
the last section. The required correlation p will he gi\en by 



Neglecting terms of higher order than the second as before and 
remembeiing that all correlations arc /.ero, we ha\e: 


P s i s 


M x M t 

Mo 2 


(1 +3a t 2 )-Z i r 2 


M t M 2 

M, 2 




2 


where, in the last step, a term of the order v : * has again been neglected. 
Substituting from (16,9) for s 1 and s 2i we have finally : 


P 


V(vS +vS)(vS+vS) 


( 10 . 10 ) 


This value of p is obviously positive, being equal to 0*5 if v x ; 

and hence even if X x and X 2 are independent, the indices formed by taking 
their ratios to a common denominator X 2 will be correlated. The value of 
p was termed by Karl Pearson the “ spurious correlation.” Thus, if 




correlation: miscellaneous theorems. 801 

measurements be taken, say, on three bones of the human skeleton, and the 
measurements grouped in threes absolutely at random, there will, neverthe¬ 
less, be a positive correlation, probably approaching 0-5, between the 
indices formed by the ratios of two of the measurements to the third. To 
give another illustration, if two individuals both observe the same series 
of magnitudes quite independently, there may be little, if any, correlation 
between their absolute errors. Hut if the errors be expressed as percent¬ 
ages of the magnitude observed, there may be considerable correlation. 
It does not follow of necessity that the correlations between indices or 
ratios are misleading. If tiie indices are uncorrelated, there will be 
a similar “ spurious ” correlation between the absolute measurements 
7a x A'a-A'i and Z 2 X 3 - X 2 , and the answer to the question whether the 
correlation between indices or that between absolute measures is mis¬ 
leading depends on the further question whether the indices or the absolute 
measures are the quantities directly determined by the causes under 
m\ estimation (cf ref. (316)). 

The ease considered, where X v X 2 , X 3 are uncorrelated, is only a 
special one ; for the general discussion cf ref. (345). For an interesting 
study of actual illustrations cf. ref. (313). 

Correlation due to Heterogeneity of Material. 

16.9. The following theorem offers some analogy with the theorem of 
4.12 for attributes: If X and Y are uncorrelated in each of two records , they 
will nevertheless exhibit some correlation when the two recot da are mingled , 
unless the mean value of X in the second record is identical with that in the first 
record, or the mean value of F in the second record is identical with that in the 
first record , oi both. 

This follows almost at once, for if M» are the mean values of A” in 
the two records, A\, I\ 2 the mean a alues of F, N ly N 2 the numbers of 
observations, and M t K the means when the two records are mingled, the 
product-sum of de\ iatums about M, A i^ 

2V 1 (2I/ 1 M)(K 1 - A)h N 2 {M 2 -M)(K 2 -A) 

Evidently the lirst term can only be zero if M - AI L or K - A* x . Hut 
the first condition gives 

n 1+ n 2 - 1 

that is, 

M x - 

Similarly, Die second condition gives K i ~ K 2 . Uoth the lirst and second 
terms can, therefore, only vanish if M 1 — M 2 or K 1 ~ A 2 . Correlation may 
accordingly be created b) r the mingling of two records in which X and Y 
vary round different means. (For a marc general form of the theorem 
cf. ref. (323).) 

Reduction of Correlation due to Mingling of Uncorrelated with 
Correlated Pairs. 

16.10. Suppose that n x observations of x and y give a correlation 
coefficient 

r « S (_^l 

1 



802 


THEORY OF STATISTICS. 


Now, let n 2 pairs be added to the material, the means and standard devia¬ 
tions of tv and y being the same as in the first series of observations, but the 
correlation zero. The value of S (,ry) will then be unaltered, and we will 
have: 


Whence 


r.j 


S(*7/) 

("l ■+ *h)0*<*y 


r 2 n , 


( 16 . 11 ) 


Suppose, for example, that a number of bones of the human skeleton lia\ o 
been disinterred during some excavations, and a correlation r 2 is observed 
between pairs of bones presumed to come from the same skeleton, this 
correlation being rather lower than might have been expected, and subject 
to some uncertainty owing to doubts as to the allocation of certain bones. 
If r x is the value that would be expected from other records, the difference 
might be accounted for on the hypothesis that, in a proportion (r 1 -r 2 )/r 1 
of all the pairs, the bones do not really belong to the same skeleton, ami 
have boon virtually paired at random. 

The Weighted Mean. 

16.11. The arithmetic mean M of a series of values ol a variable A" 
was defined as the quotient of the sum of those values by their number N, 
or 

M S(X)IN 

If, on the other hand, we multiply each indiv idual observed value of A' 
by some numerical coefficient or nright lf \ the quotient of the sum of such 
products by the sum of the weights is defined as a weighted mean of A T , and 
may be denoted by M'; so that 

yi/ - sorx)/s(iv) 

The distinction between u weighted and u unweighted ” means is, 
it should be noted, v«ry often formal rather than essential, for the 
“ weights ” may be regarded as actual, estimated or virtual frequencies. 
The weighted mean then becomes simply an arithmetic mean, in which 
some new quantit y is regarded as the unit. Thus, if we are given the means 
M v M 2 , d/„ . . . M, of r series of observations, but do not know the 
number of observations in every series, v\e ma\ form a general average by 
taking the arithmetic mean of all the means, viz. S(.li)/r, treating the series 
as the unit. But if we know the number of observations in every scries it 
will be better to form the weighted mean $(NM)/8(N) t weighting each mean 
in proportion to the number of observations in the series on which it is 
based. The second form of average w'ould be quite correctly spoken of as 
a weighted mean of the means of the several series : at the same time, it 
is simply the arithmetic mean of all the senes pooled together, Le. the 
arithmetic mean obtained by treating the observation arid not the series 
as the unit. 

16.12. To give an arithmetical illustration, if a commodity is sold 
at different prices m different markets, it will be better to form an average 
price, not by taking the arithmetic mean of the several market prices, 



CORRELATION : MJSCKLLANKOU8 THEOREMS. 


803 


treating the market as the unit, but by weighting each price in proportion 
to t he quantity sold at that price, if known, i.e. treating the unit of quantity 
as the unit of frequency. Thus, if wheat has been sold in market A at an 
average price of 29s. Id. per quarter, in market B at an average price of 
27s. 7d. and in market C at an average price of 28s. 4cl., we may, if no 
statement is made as to the quantities sold at these prices (as very often 
happens in the case of statements as to market prices), take the arithmetic 
mean (28s. 4d.) as the general average. But if we know that 23,930 qrs. 
were sold at A, only 2t> qrs. at B and 3,933 qrs. at (\ it ill be better to take 
the weighted mean 

(29s. Id. x 23,930) -4 (27s. 7d. x2t>) (28s. 4d. x 3,933) 

27,889 ~ 


to the nearest penny. This is appreciably higher than the arithmetic mean 
price, which is lowered by the undue importance attached to the small 
markets B and C. 

16.13. In the case of index-numbers for exhibiting the changes m 
average prices from \ ear to \ ear (<;/’. 7.34), it may make a sensible difference 
whether we take the simple arithmetic mean of the index-numbers for 
different commodities m any one year as representing the price-level in 
that year, or weight the index-numbers for the several commodities accord¬ 
ing to then importance from some point of mow; and much has been 
written as to the weights to be chosen. If, loi example, our standpoint 
be that of some average consumer, we may take as the zveight for each 
commodity the sum which he spends on that commodity in an average 
\ear, so that the frequency of each commodity is taken as the number of 
shillings or pounds spent thereon instead of simply as unity. 

16.14. Bates or ratios like the birth-, death- or marriage-rates of a 
country may be regarded as weighted means. For, treating the rate for 
simplicity as a fraction, and not as a rate per 1000 of the population, 


Birth-rate of whole country - 


Total births 
Total population 


S(Birth-rate in each district xpopulation in that district) 
^(Population of each district) 

i.e. the rate for the whole country is the mean of the rates in the different 
districts, weighting each in proportion to its population. We use the 
weighted and unweighted means of such rates as illustrations in 16.16 
below. 

16.15. It is eMdent that any weighted mean w r ill in general differ from 
the unweighted mean of the same quantities, and it is required to find an 
expression for tins difference. If r be the correlation between weights and 
variables, a w and <j x the standard deviations and w the mean weight, wc 
have at once 

S(WX)-N(Mw + n w a m ) 

whence 

M' =M + rcr/. v ' 

w 


. (16.12) 



804 


THEORY OF STATISTICS. 


That is to say, if the weights and variables are positively correlated, the 
weighted mean is the greater ; if negatively, the less. In some eases r is 
very small, and then weighting makes little difference, but in others the 
difference is large and important, r having a sensible value and a a a XL lw a 
large value. 

16.16. The difference between weighted and unweighted means of 
death-rates, birth-rates or other rates on the population in different 
districts is, for instance, nearly always of importance. Thus we have the 
following figures for rates of pauperism (Jour. Roif. Stat . Sov., vol. 59, 1896, 
p. 319):— 


January 1. 

Percentages of the Population in 
receipt of Relief. 

Arithmetic Moan 
of Rates m 
different Districts. 

England and 

VVales as a 
whole. 

1850 

6-61 

5*80 

I860 

5*20 

4 *26 

1870 

5-45 

1‘77 

1881 

8 68 

3*12 

1891 

8*29 

2*69 


In this case the weighted mean is markedly the less, and the correlation 
between the population of a district and its pauperism must therefore be 
negative, the larger (on the whole urban) districts having the lower per¬ 
centage m receipt of relief. On the other hand, for the decade 1881 90 the 
average birth-rate for England and Wales was 32*31 per thousand, the 
arithmetic mean of the rales for the* different districts 30*31 only. The 
weighted mean was there fore the greater, the birth-rate being higher in the 
more populous (urban) districts, in which there is a greater proportion of 
young married persons. 

For the year 1891 the average population of a poor law district was 
found to be roughly 45,900 and the standard deviation o, L 56,400 (popula¬ 
tions ranging from under 2000 to over half a million). The standard 
deviation o* 3 of the percentages of the population in receipt of relief was 
1*24, We have therefore, for the correlation between pauperism and 
population, 

3*29-2*60 459 

r — - x 

1*21 564 

- 0*39 

For the birth-iate, on the other hand, assuming that cr w /zi is approxi¬ 
mately the same for the decade 1881-90 as in 1891, and neglecting the 
fact that in a few instances Registration Districts differ from Poor-law 
Unions, we have, cr x being 4*08, 

32*34 — 80*34 459 
r ~ ~ '4 08 ~~ * 564 

»* +0*40 




CORRELATION: MISCELLANEOUS THEOREMS. 805 

The closeness of the numerical values of r in the two cases is, of course, 
accidental. 

16.17. The principle of weighting finds one very important applica¬ 
tion in the treatment of such rales as death-rates, which are largely affected 
by the age and sex composition of the population. Neglecting, for 
simplicity, the question of sex, suppose the numbers of deaths are noted 
in a certain district for, say, the age-groups 0-, 10-, 20-, etc., in 
which the fractions of the whole population are p 0 , p 1% p 2 , etc., where 
S(p)~l. Let the death-rates for the corresponding age-groups be d Ql 
d l9 d 2 , etc. Then the ordinary or crude death-rate for the district is 

l)^S(dp) .... (16.13) 

For some other district taken as a basis ol comparison, perhaps the 
country as a whole, the death-rates and fractions of the population in the 
several age-groups may be 8 X , 8 2 , 8j, . . tt l> 7 r 2 , 7 r 3 , . . and the crude 
death-rate 

A-S(Stt) .... (16.14) 

Now, 1) and A differ either because the d's and dfs dilfer or because 
the p\ and 77 \s differ, or both. It may happen that really both districts 
are about equally healthy, and the death-rates approximately the same 
for all age-classes, but, owing to a difference of weighting, the first average 
may be markedly higher than the second, or vice versa. If the first 
district be a rural district and the second urban, for instance, there will be 
a larger pioportion of the old 111 the former, and it may possibly have a 
higher crude death-rate than the second, in spite of lower death-rates in 
every class. The comparison of crude death-rates is therefore liable to 
lead to erroneous conclusions. The diHicultv may be got over by averaging 
the age-class death-rates m tin* district not with the weights p l9 p a , p*, . . . 
given by its own population, but with the weights 775 , 7 r 2 , . . . given 

by the population of the standard district. The standardised death-rate 
for the district will then be 

rr S(d7r) .... ( 16 . 15 ) 

and /)' and A will be comparable as regards age-distribution. There is 
obviously no difficulty in taking sex into account as well as age if neces¬ 
sary. The death-rates must be noted for each sex separately in every 
age-class and averaged with a system of weights based on the standard 
population. The method is also of importance for comparing death-rates 
in different classes of the population, e.g. those engaged in given occupa¬ 
tions, as well as m different districts, and is used for both these purposes 
in the publications of the Registrar-General for England and Wales. 

16.18. Difficulty may arise in practical eases from the fact that 
the death-rates d t , d 2 , d,, . . . are not known for the districts or classes 
which it is desired to compare with the standard population, but only 
the crude rates 1) and the fractional populations of the age-elasses p l9 p 2 , 
Pa, . . . The difficulty may be partially obviated ( cj . 4.16 and Example 
4.3, pp. 58-60) by forming what is termed an index death-rate A' for 
the class or district, A' being given by 

A'-S(Sp) .... (16.16) 

Le. the rates of the standard population averaged with the weights of 

20 



306 


THEORY OF STATISTICS. 


the district population. It is the crude death-rate that there would be in 
the district if the rate in every age-class were the same as in the standard 
population. An approximate standardised death-rate for the district or 
class is then given by 

])" = ]) X .... (16.17) 

D" is not necessarily, nor generally, the same as D\ It can only be the 
same if 

S(dTT) S(Stt) 

S(rfp) S(8 j>) 


This will hold good if, e.g., the death-rates in the standard population 
and the district stand to one another m the same ratio in all age-classes, 
i.e . 8 i/d 1 -~hjd 2 8j/d, — etc. This method of standardisation was used in 
the Annual Summaries of the Registrar-General for England and WaEs. 

16.19. Roth methods of standardisation- that of 16.17 and that of 
16.18 —are of great importance. They are obviously applicable to other 
rates besides death-rates, e.g. birth-rates. Further, they may readily be 
extended into quite different fields. Thus it has been suggested that 
standardised average heights or standardised average weights of the children 
in different schools might be obtained on the basis of a standard school 
population of given age and sex composition, or indeed of given composi¬ 
tion as regards hair- and eye colour as well. 

16.20. In 16.11 16.16 we have dealt only with the theory of the 
weighted arithmetic mean, but it should he noted that any form of average 
can be weighted. Thus a weighted median can he formed by finding tile 
value of the variable such that the' sum of the weights of lesser values is 
equal to the sum of the weights of greater values. A weighted mode 
could be formed by finding the value of the variable for which the sum 
of the weights was greatest, allowing for the smoothing of casual fluctua¬ 
tions. Similarly, a weighted geometric mean could be calculated by 
weighting the logarithms of every value of tlie variable before taking the 
arithmetic mean, i.e. 


log = 


S(K' log X) 

S(H') 


SUMMARY. 

1. The standard deviation of the sum of variables X l9 X 2 , . . . X N 
is given by 

or 2 *+ . . . + a/ + 2r 12 cj 1 cr 2 + 2r 13 or 3 (T 3 + . . . +2r 2i u 2 cr 3 l . . . 

2. In particular, the variance of the sum of N uncorrclatcd variates is 
the sum of their variances. 

X x 

8. If X l9 X 2 and X. tl arc uneorrelated, the indices * - will neverthe- 

less be correlated in general. 



correlation: miscellaneous theorems. 


307 


4. If X and Y are uneorrelated in cat'll of two separate records, they 
will be correlated in the sum of the two records, unless cither the means 
of X or the means of F, or both, arc Die same in the two records. 

5. If correlated and uncorrelated material is mingled, the correlation 
in the total is lower than that in the correlated portion. 

6. An arithmetic mean is weighted when, in the calculation of^S(JC), 

each value of the variate is multiplied by a weight It'. 

7. The weighted arithmetic mean is greater or loss than the unweighted 
mean according as the weights and variables are positively or negatively 
correlated. 


EXERCISES. 

1G.1. (Data from the Decennial Supplements to the Annual Reports of the 
Registrar-General for England and Wales.) The following particulars arc found 
for at) small registration districts in wliieh the number of births in a decade 
ranged between 1500 and 2500:— 



Proportion of Male Births 
per 1000 ol nil Births. 


Mean. 

Standard 

deviation 

1881-1890 

1891-1900 

508*1 

508 4 

12*80 

10 37 

Both decades 

608 25 

11*65 


It is believed, however, that a great part of the observed standard deviation 
is due to mere “fluctuations of sampling” of no real significance. 

Given that the correlation between the proportions of male births in a 
distiict in the two decades is -t 0 00, estimate (1) the true standard deviation 
freed from such fluctuations of sampling; (2) the standard deviation of fluctua¬ 
tions of sampling, t.c. of the errors produced b> such fluctuations in the observed 
proportions of male births. 

16.2. (Data from Pearson, ref. (345).) The coefficients of variation for 
breadth, height and length of certain skulls are 3 80, 3 50 and 3 21 per cent, 
respectively. Find the “spurious correlation” between the breadth/length and 
height/length indices, absolute measures being combined at random so that they 
are uneorrelated. 

16.3. (Data from Boas, communicated to Pearson ; vf. Fawcett and Pearson, 
Proc. Hoy. Soc vol. 62, p. 413.) From short series of measurements on American 
Indians, the mean coefficient of correlation found between father and son, and 
father and daughter, for cephalic index, is 0 14; between mother and son, and 
mother and daughter, 0 33. Assuming these coefficients should be the same if it 
were not for the looseness of family relations, find the proportion of children not 
due to the reputed father. 

16.4. Find the correlation between X, -I X 2 and X 2 H X a , A\, A~ ft and X 3 being 
uneorrelated. 

16.5. Find the correlation between X x and aX L i 0X 2J X x and A a being 
uneorrelated. 



308 


THEOEY OF STATISTICS 


16.6. (Referring to Example 15.4, p. 202.) Use the answer to Exercise 16.5 
to estimate, very roughly, the correlation that would be found between annual 
movements in infantile and general mortality if the mortality of those under 
and over 1 year of age were uncorrelated. Note that— 

General mortality per 1 _ . . _ Births 

1000 of population ) lnfont,u ' n^rtality per 1000 births ■ P()][)ull4ti ~ 

f Deaths over one year per 1000 of population 

and treat the ratio of births to population as if it were constant at a rough 
average value, say 0 032. The standard deviation of annual movements in 
infantile mortality is (loc. cit.) 10-76, and that of annual movements in mortality 
other than infantile may be taken as sensibly the same as that of general 
mortality, or, say, 113 units. 

10.7. If the relation 

ajc y + her 2 4-Ar 3 =0 

holds lor all values of ac> and (which are, in our usual notation, deviations 
from the respective arithmetic means), find the correlations between ,r„ <r 2 and 
in terms of their standard deviations and the values of a, b and i. 

16.8. What is the effect on a weighted mean of errors m the weights of the 
quantities weighted, sueli errors being uncorrelated with one another, with the 
weights or with the variables: (1) if the arithmetic mean values of the errors 
are zero, (2) if the arithmetic mean values of the errors are not zero? 

16.9. The following are the variances of the rainfall (1) for January to March, 
(2) for April to December, (3) for the whole year, at Greenwich in the eighty 
years 1841-1920, the unit being a millimetre :— 

January-March . . . a, 2 - 1,521 

April-December . . a/- 8,968 

Whole year .... a 1 --10,754 

Find the correlation between the rand,ill in January-March and April - 
December. 



CHAPTER 17. 


SIMPLE CURVE FITTING. 

The Problem. 

17.1. In this chapter we turn aside somewhat from the line of 
development of previous chapters in order to study a subject of consider- 
able theoretical and practical importance--the representation of relation¬ 
ship between two variables by simple algebraic expressions. Our work 
on correlation has already led us to lit regression lines and planes to the 
means of arrays. We now attack a rather more general problem. An 
illustration will make clear the type of inquiry involved. 

Table 17.1 shows the estimated distance and velocities of recession of 
certain nebula? in the outlying parts of the visible universe. 

Tawj' 17.1. - Estimated Distance and Velocities of Recession of 10 Extra-galactic Nebulce 
(Edwin Hubble arul Milton T„ Ilumason, “The Velocity-distance Relation among 
Extra-galat tic Nebula',” Conti ibid tons from Mount Wilson Observatory, Carnegie 
Institute of Washington, No. 127, Astrophysical Journal , vol. 71, 1981, pp. 48 80). 


r i 

1 


| Constellation in 

i 

Mean Velocity j 

Distance 

1 which the Nebula ( 

(Kilometres pel i 

(millions of 

j is situated I 

second) i 

parse* s) 

Isolated Nebula 11 

030 

,20 

i . | 

890 

1 82 

I Isolated Nebula i 

2,350 

3 31 ! 

| Pegasus 

3.810 

721 

1 Pisces. 

1,630 i 

6 92 

i Cancer 

4,820 1 

9 12 

| Perseus 

3,230 | 

10 97 

Coma . 

7,500 ; 

14 45 

i Ursa Major . 

11,800 * 

22 91 

1 I^OO . . . ! 

19,000 ! 

36 31 


A little inspection of the table will show that there appears to be some 
relation between distance and velocity- the greater the one, the greater 
the other, with only one exception. A diagram makes the relation clearer 
still. In fig. 17.1 wo have taken the two variables velocity and distance 
as rectangular co-ordinates y and ,r, and have marked for each nebula 
a point whose co-ordinates are the distance and velocity of that nebula. 
The ten points so obtained evidently lie very approximately on a straight 
line or, to express the same fact algebraically, the ten values of the variables 
are closely represented by an equation of the form 


y**a o + 

309 


(17.1) 





810 


THEORY OF STATISTICS. 


17 . 2 . No straight line, however, passes exactly through all the points, 
although a great many lines may be drawn which nearly do so. The 
question then arises, is there a straight line which fits the points better 
than all others, and if so, which is it ? Or, in other language, what 
values of a 0 and a x in equation (17.1) must we take to get the best repre¬ 
sentation of the linear relationship between the two variables ? And, as 
a further question, can we devise a measure of the closeness of the fit of 
the various lines which can be drawn ? 



Distance (mdlions of parsecs) 

Fio. 17.1. Relationship between Distance and Velocity of Recession in 
Certain Hxlr.i-jjalache Nelml.'e. (Table 17.1.) 

17 . 3 . In the foregoing illustration it is clear from the data or from 
the diagram that a linear relationship between the variables gives a very 
close picture of the truth. In other cases the points of the diagram will 
lie more or less oil a curve, and no straight line will give a satisfactory 
representation. We should then wish to investigate whether the depend¬ 
ence of y on <r may be suitably represented by the more general equation 

if - a 0 -f a v r I ap a \ ... fa/' . . (17.2) 

which, ii\ the diagram, corresponds to a curve of the type known as 
parabolic. The number p indicates the degree of the parabola, and 
we speak of quadratic, cubic, quartic parabolas, meaning curves of type 
(17.2) with p — 2, 8, t, respectively. 1 

17 . 4 . Our general problem may, then, be stated as follows : Given 
n pairs of values of two variables, A 2 1 A 2 T 2 , • • . X n V n9 to express 
the values of one of them as nearly as may be in terms of the other by an 
equation of the form (17.2) ; and to measure the closeness of the approxi¬ 
mation of the values of \j given by the equation to the actual values. In 
geometrical language, ghen n points in a plane, to fit to them a curve of 
the parabolic type (17.2) and to measure the closeness of fit. 

17 . 5 . Hie representation of data in this way may serve several 
purposes. In the first place, it may present the* relationship between 
the two variables in a useful summary form. Secondly, it may be used 
to interpolate, i.e. to estimate the values of one variable which would 
correspond to specified values of the other. In fig. 17.1, for example 
the straight line which has been drawn in, and whose equation is obtained 
below, tells us what we might expect to be the velocity of a nebula whose 




SIMPLE CURVE FITTING. r 811 

distance is, say, 20 million parsecs, on the assumption that the linear 
relation holds good for nebulae in general. 

17 . 6 . Again, the representation may also be very suggestive to the 
theorist. The linear form of the relationship between the variables of 
Table 17.1 means more than a convenient summary of the facts, and has 
inspired a great deal of research into the nature of the physical universe. 
In such cases, the derived equation is regarded as the expression of a 
law of nature, and the deviations of the observed values from those given 
by it are interpreted as fluctuations arising from experimental error or 
secondary perturbations. This standpoint is common m physics, in which 
data often lie very closely about a smooth curve. 

The Method of Least Squares. 

17 . 7 . Let us suppose that we have n pairs of values X x Y ly . . . X n Y n> 

and that we wish to represent them by an equation of the type (17.2). 
Our problem is, having fixed the value of p, to determine the constants 
<7 0 , . . . a v in terms of the observed values A', Y, so as to get the best 

possible fit. 

The expression “ best possible fit ” may be defined in more than one 
way, and consequently there is no unique method of determining the * 
constants. Several methods have been proposed, and our choice between 
them is determined mainly by convenience. One way, which is suggested 
by the geometrical representation, is to choose the curve of equation 
(17.2) so tlint the sum of the distances (taken as positive) of the points 
from it is a minimum, the sum of the distances being regarded as a measure 
of goodness of fit. and the “best” fit being given by the curve of speeilied 
degree for which that sum is least. But this method, whatever its theo¬ 
retical attractions, suffers from the disadvantage that it is difficult to apply 
in practice except for the straight line. 

An alternative method, which is in almost universal use at the present 
time, is that known as the Method of Least Squares, and we proceed 
to discuss it at length. We have already used it to find regression lines 
( 11.20 and 14 . 4 ). 

17 . 8 . If we substitute the value X r in equation (17.2) we get a 
quantity // r , given by 

2 / r ~-tf 0 -l a x X, +■ f* 8 AV+ • • • -♦ a p X r p . . (17.3) 

This is not in general the same as Y r , and we therefore define the 
residual (; r as 

~~ Y 1 -//, - Y 7 - a 0 - a y X r - . . . a n X* . (17.4) 

There will be n residuals, one for each pair A*, Y, and they are all zero 
if, and only if, the curve is a perfect fit. We then take the sum of the 
squares of residuals: 

(Y r -a {) -t h X r ~ . . . -a p X r »)* * (17.5) 

If U is zero, each residual must be zero, and the data are represented 
perfectly by the equation. Except in this ease, LI is positive. The 
further the points lie from the curve of equation (17.2), the greater U 
will be. U therefore provides one measure of the closeness of fit. From 
this standpoint, the best fit will be that for which V is least. 



THEORY OF STATISTICS, 


812 

The Method of Least Squares adopts this criterion, and states that 
the constants a shall he determined so that U is a minimum . 

17.9. The reason for taking the sum of squares of residuals, rather 
than the sum of residuals simply, is akin to that which led us to prefer 
the standard deviation to the mean" deviation as a measure of dispersion 
(Chap. 8), namely, that the former is more convenient in theory and leads 
to equations which are easier to handle in practice. 

17.10. It was formerly the custom, and is so still in works on the 
theory of observations, to derive the method of least squares from certain 
theoretical considerations, the assumed normality of the distribution of 
errors of observation being one such. It is, however, more than doubtful 
whether the conditions for the theoretical validity of the method are 
realised m statistical practice, and the student would do well to regard 
the method as recommended chiefly by its comparative simplicity and by 
the fact that it lias stood the test of experience. 

17.11. Consider now the quantity U, given by equation (17.5). 
tf 0 , a l% . . . a p are to be chosen so that this is a minimum, say I7 0 . Let 
us imagine this done. 

If, now, we substitute in equation (17.5) a Q f e 0 for a 0 , a l 4-^ for a 19 
a 2 4- f 2 for u 2 , and so on, we shall get a quantity l\ given by 

u 1 =S{r - K+ f o) - ("j + - • • • - K, h c 

and r/ x is greater than U 0 for all values of e 0 , . . . c v . 

Now, 

TJ^SKY-a.-a.X ~ . . . - a p X p ) - (* 0 + + . . . + f,A r *»)) a 

= S(F-tf 0 -a x X - . . . -a v X p ) 2 

-2S( Y -a 0 -a { X - . . . - a p X p )(e 0 + e ± X + . . . h€ P X J> ) 

4-S(f ( , 4-A 4 . . . 4- e v A v )* 

The first of these terms is equal to V 0 . lienee, if Uj 7 V 0 , we must have 
-2S(r -a 0 -aiX- . . . -a p X p )(e Q + €l X 4- . . . t e„X p ) 

+ S(€ 0 +6 1 A+ . . . f € p X p ) 2 0.(17.6) 

This is to be true for all values of e 0 . . . * p . Let us then take these 
quantities to be very small. The second term in equation (17.6), depend¬ 
ing as it does on the squares of the e\s, will be small compared with the first, 
and may be neglected. (17.6) will then be true only if the first term 
vanishes, for otherwise the Fs could be so chosen in sign as to make the 
first term negative. 

Hence, 

S(I -a 0 ~a L ?i ~ . . . - a v X p )(e 0 4- 4- . . . 4-e 3} X p )==0 (17.7) 

This is true for all small values of the e’s. lienee the coefficients of 
€& €j, . . . e p all vanish, i.e. we have: 

S(F) -g 0 » -a x S(A') - . . . -ajS(X*) —0 

S (YX) -a Q S(X) ~a x H(X 2 ) - . . . ~a p S(X p ^) =0 

S(FA 2 )-a 0 S(Z2)-^S(Z«) - . . . -a p S(X p ' 2 ) -0 

S(YX p )~a 0 $(X p )-< h S(X p +') - . . . -a 9 S(X**) -0, 


V (17.8) 



SIMPLE CURVE FITTING. 


t 


318 


The equations (17.8) give us p -hi equations in the (p +1) unknowns 
n Q . . . Hence they may be solved so as to give the a y s in terms of 
the calculable quantities S(A), S(A 2 ), . . . S(A 2 *>), S(F), S(FA), . . . 
S (FA*). 

17,12. It will be seen that the solution of these equations depends on 
the evaluation of the various summed quantities. A first step is therefore 
to calculate these sums, and this is done by a process very similar to that 
used in finding the moments of a distribution. 

We can, in fact, express the equations in terms of moments. Dividing 

each equation by a, and remembering that p r ' — ^ S(A''), we have: 

ti 

* S (V) -n„ -a,/*,' -0 

,S(VX) - -a i/i,' - ...-o,y r n <> 


y s ( YXJ ') - 2 - • • • - n j,y P = 0 

Equations for Fitting a Straight Line. 

17.13. In the simplest case, that of a straight line, we have p — 1, and 
the equations (17.9) become: 



^S(F) =« 0 +ff jMl ' | 

/( sn a ) o n /j., \ n^i | 


(17.10) 


In particular, if A' and V arc measured about their means and hence 
are denoted bj x, //, we ha\c: 

!h =° 

S(v)-o 

and hence, from (17.10), 

a 0 = 0 
* 

so that the fitted line is 

y-j; 1 S(//x) .... (17.11) 

i.e . passes through the mean of X and F. This is, in fact, the first regression 
equation of (11.6) (p. 209) in another form. 

17.14. In equation (17.2) it is customary to call x the “ independent ” 
variable and y the “ dependent ” variable. In any given case it is, as a 
rule, possible to regard either of the variables under consideration as the 
independent variable, and the other as the dependent variable. We shall 
then get two expressions, one giving variable A in terms of variable B , the 



814 THEORY OF STATISTICS. 

other giving B in terms of A ; and there will be two curves of closest fit, 
just as there are two regression lines in the theory of correlation. 

These two curves are not, in general, the same, and the result sounds a 
little paradoxical until we examine how the two curves are derived. We 
have, in fact, two definitions of closest fit, one minimising residuals of the 
type {A-a 0 ~a x B~ . . .) a , the other minimising residuals of the type 
(fi ~a 0 f ~a x 'A - . . .) 2 . On a priori grounds there is nothing to choose 
between the two. 

17.15. Which of the two forms we choose will depend in practice on 
a variety of circumstances. Sometimes one variable is clearly marked out 
as the independent variable. For example, in considering the way in 
which a population varies with time, it is almost inevitable to regard the 
former as dependent on the latter, and not vice versa. In other eases the 
choice is dictated by the purpose in view. For instance, in expressing the 
relationship between current and resistance in an electric circuit, an in¬ 
vestigator would probably take as the independent variable that factor 
over which he had direct control. Frequently, however, there is no guide 
of this kind, and it may be necessary to ascertain both curves. 1 

Calculation. 

17.16. The calculations necessary to fit a curve by the method of 
least squares fall into tw r o stages. First of all, the sums of squares which 
appear in equation (17.8) must be found, or, what amounts to the same 
tiling, the moments. To fit a curve of degree p it is necessary to find 2 p 
sums of the type S(X L ) and p +1 sums of the type S(l r -Y A ) (including S( Y)). 
The work is best carried out systematically after the manner of Chapter 0, 
and several devices considerably shorten the arithmetical labour. 

(a) By a suitable choice of origin and unit we can often reduce the 
given values of A' and Y to smaller numbers—a great help in calculating 
the higher powers and sums. For instance, if the values of Y were 025, 
050, 075, 700, we could take an origin at i/ —025, and a scale of one unit 
=^25, and our new values would then be 0, 1, 2, 3. 

(b) If the values of the independent variable proceed by equal steps, 
and particularly if there is an odd number of them, the labour of calcula¬ 
tion is enormously reduced. We shall consider this important ease in 
some detail below (17.22). 

When the various sums have been ascertained, the second stage, that 
of the solution of the equations (17.8), may be carried through. For a 
curve of degree p there are j> +1 of these equations. They are linear in 
the unknowns a, and their solution offers only arithmetical difficulty. 

17.17. Before proceeding to consider some examples, we may remark 

1 In this connection we may refer to a problem for which, so far as we are aware, 
no general solution has been found. Given that the theoretical law relating y and x 
is linear, but that the sets of values given in the data are both subject to error, what is 
the unique straight line most probably (in some sense) representing the truth? The 
least squares solutions will give us lines which, in a certain sense, are the most likely if 
the dependent variable is subject to errors normally distributed; but they do not 
yield a line which allows lor errors in both variables. 

Greenwood and Yule (Proc . Roy. Soc. Medicine , vol. 8, 1015, p. 113, Section of 
Epidemiology) used the principal axis (12.9) as an empirically good solution. This 
makes the sum of squares of perpendiculars from the points on to the line a minimum. 

The difficulty is greatly intensified if the theoretical law is a polynomial of degree 
higher than the first. 



SIMPLE CURVE FITTING. 


815 


on one point of theoretical interest. It is always possible to fit a curve 
of degree p exactly to p + 1 points ; for instance, a straight line can be 
drawn to pass exactly through two points, a cubic parabola through four 
points, and so on. Thus, it' we have n points we can always Imd a curve 
of degree n -1 which is an exact fit. But in practice n is rarely less than 
ten, and a fitted curve of degree as high as this would have no practical 
value and very little theoretical interest. It is only exceptionally that use 
is found for fitted curves of degree higher than the fourth. 

We will now consider some examples. 

Example 17J .—Let us fit a straight line to the data of Table 17.1. To 
illustrate the method we will deal with both eases, taking first distance and 
then velocity as the independent variable. 

Denoting, then, distance by jr and velocity by //, we wish to fit a curve 
of the form 

For this we require S(A), S(A" 2 ), S( Y ) and S( FA"). For the alternative 
ease we shall also require S(F 2 ). 

The arithmetic is shown in Table 17.2. In successive columns we write, 
for each nebula. F, A, A" 2 , FA and V 2 . Totals are shown at the foot of 
the columns. 

Equations (17.8) Mien become: 

S(F) # 0 n-#,S(A") 0 
S(FA) - # 0 S(A ) -a t S(A’ a ) -=() 
or 

61-26-10# 0 -11 J 25# t -0 
1261-1988 ~m*25# 0 2871-6145# x - 0 

Mnltiplving the first of these bv 111-2.*) and the second by 10, and sub¬ 
tracting, we get 

5010 088 - 10,008*0825#! 0 

a ] 0*527 (more accurately , 0-520,080,006) 

and hence, 

a {) 0*109 (more aeeurati ly, 0-108,080,2Ml) 

So that 

// 0-109 +0-527^ . . . (a) 

This line is shown in fig. 17.1. 

If we wish to express distance in terms ot \eloeity, we luwe, inter¬ 
changing A” and F in equations (17.8): 

,r - # 0 ' 1 «/// 

S(A’)-"o'” - «l' S 0’) ° 
S(XY)-a 0 'S(Y)-a l ’S(Y 1 ) - 0 
or 

111 25 ~10# 0 ' -01-20#!' 0 

1201 • t988 01 26# 0 ' -072-8998#!' 0 

whence 

# 0 ' - -0*135 
#,' = 1-89 

-0135 + 1*8% 


and 


(0 



816 


THEOEY OF STATISTICS* 


Table 17.2 .—Practical Work for Fitting a Straight Line to the Data of Table 17 J. 



Mean Velocity 

Distance 




Constellation. 

(000 km. per 
second). 

(millions of 
parsecs). 





r. 

X. 

X 2 . 

YX. . 

Y\ 

Isolated Nebula II . 

0-03 

1-20 

1-4400 

0-7560 

0-3969 

Virgo. 

0*89 

1-82 

3-3124 

1*6198 

0-7921 

Isolated Nebula T , 

2 35 

3-31 

10*9561 

7-7785 

5*5225 

Pegasus . . j 

3-8 i i 

7-24 

i 52-4176 

27-5844 j 

14-5161 

Pisoes 

! 4-63 

6-92 

47-8864 

32-0396 

21-4369 

Cancer . . 1 

4-82 

9-12 

83-1744 

43-9584 1 

23 2324 

Perseus 

5-23 | 

10-97 1 

1 120 3409 

57-3731 

[ 27*3529 

Coma 

7 50 

14-45 i 

i 208-8025 

108 3750 

1 56-2500 

Ursa Major. 

11-80 

22-91 i 

524-8681 

270-3380 

I 139-2400 

Leo . 

19-00 

36*31 

1318 4161 

1 

711 6760 

384 1600 

Total . 

61-26 j 

114 25 

2371-6115 

i 

1261 4988 

! 672 8998 


Equations (a) and (b) are nearly identical, for dividing (a) by 0*527 
and rearranging, we have : 

x=- - 0*207 +1*90// 

This is exceptional, and results from the closeness with which the points 
lie to a straight line. The correlation between X and Y is, in fact, 0-997. 

Reduction of Data to Linear Form. 

17.18. Example 17.2, —It sometimes happens that we may reduce 
data to a linear form by some simple transformation. Table 17.3, for 
example, shows the number of fronds of a duckweed plant on fourteen 
successive days. The number of fronds (N) clearly does not increase 
uniformly with time (<r), and the curve of growtli is not linear, as may lx* 
seen by graphing N against x. There arc theoretical reasons for inquiring 
whether the law of growth may be represented by an equation of the form 

N ?=ae bx 

A population which conformed to this equation would have the property 
that its rate of increase at any moment was proportional to the size of 
the population at that moment - its “birth-rate,” so to speak, would be a 
constant. 

Taking logarithms, we have: 

log, N ~ log, a + hx 

and if we now write // - log, .V, we have : 

// — log c a \- hx 

which is linear in x and y. 

We should, of course, have a relation of the same form, with different 
values of the constants a and b , if we took logarithms to base 10, which 
is usually the more convenient procedure. 

We therefore try the effect of fitting a straight line to x (the time) and 





SIMPLE CURVE PITTING. ’ 317 

log^iV (log number of fronds), From fig. 17.2 it will be seen that the 
fit is a close one. 



Fio. 17.2. Straight Lino fitted to Data of Table 17.3. 
(Growth of Duckweed.) 


Table. 17.3.- Gioivth of Duckweed, (V. H. Blackman, A alurc, Oth June 1930, 
quoting data of Ashby and Oxley.) 


Number of Fronds. 

login 

Days. 



N. 

y. 

X. 

X 2 . 

YX, 

1(H) 

2 0000000 

1 

1 

2-0000000 

127 

2 1038037 

2 

4 

4 2076074 

171 

2 2329961 

3 

9 

0 6989883 

233 

2 3673559 

4 

16 

9 1694236 

323 

2 5092025 

5 

25 

12 5460125 

452 

2 6551384 

6 

36 

15*9308304 

654 

2 8155777 

7 

49 

19*7090439 

918 

2 9628427 

8 

64 

23*7027416 

1406 

3 1479853 

9 

81 

! 28*3318677 

2150 

3*3324385 

10 

100 

33*3243850 

2800 

3*4471580 

11 

121 

37*9187380 

4140 

3*6170003 

12 

144 

43*4040036 

5760 

3*7604225 

13 

169 

48*8854925 

8250 

3*9164539 

14 

196 

I 54*8303540 

Total 

. 

40*8683755 

105 

1015 

340-9594891 





818 


THEORY OF STATISTICS. 


The preliminary work is shown in Table 17.8. We find first F, 
corresponding to log 10 N, then S(A), S(F), S(A" 2 ), S(FA). For this 
particular example we do not require S(F 2 ). In view of the simple 
character of the values of X there is little saving m taking other origins 
or units for X and Y, although, if we were fitting a curve of higher order, 
it might be an advantage to take a different origin for X. 

Equations (17.8) then become: 



S(F) na 0 «jS(-V) 

0 


S(FX) u 0 S(A)-^S(A 2 ) 

0 

or 

40-8888755 - 14a 0 105a! 

-0 


840-9594891 - 105« 0 - 101 

0 

whence 



a 0 — 1-785 
=0*151 t 


and 

y- 1-785 t 0*1514<* . 



Raising this to power 10, and remembering that 10" A, we have: 

A7*10 J - 7HB xH> w,6,4j (b) 

which we may also write, expressing the powers of 10 as actual numbers : 

N = 00-95 x (1-117)* 

17.19. Example J7.3.~ The process of taking logarithms may be 
applied to both variables. In Table 17.t are given the eosts per unit of 
electricity sold (rj) and the number of units sold per head of the population 
served b}^ the undertaking (£) for 27 electricity undertakings. The data 
were taken from the Returns of the Electricity Commission for 1988-84, 
which cover about six hundred undertakings, by selecting e\ery twenty- 
fifth. They arc, therefore, onh a comparatively small sample, but they 
reflect fairly accurately the general relationship between £ and y for the 
whole number of undertakings. 

This relationship is illustrated by fig. 17.8, on which £ is graphed 
against 77. It will be seen that, broadly, the larger the number of units 
sold per head, the lower the cost per unit. 

The points of fig. 17.8 lie, in fact, about a cur ve which suggests a 
relation of the form : 

7 j - a£~ b 

As £ becomes larger, becomes smaller, and as l tends to /,< ro, 7 / tends lo 
infinity. Let us t*y to tit a curve of this kind to the data. 

We have: 

log 7 / - log a h log £ 

and, putting 

y log 7/, x -- log £ 

y = log a ~ bx 

which is linear. Wc therefore proceed to fit a straight line to log y and 
log f. 



SIMPLE CURVE FITTING. 


819 



Units sold per head of population 


Fie. 17.:*. -Curve fitted to Data of Table 17.t. 


The preliminary work is shown in Table 17.4. Equations (17.8) 
become, in the usual way, 

5-2408 - 27tf 0 - 50-1311% =0 


whence 

and 

From which 

or 


7-8008 -50-181 la 0 -97*1450# j -0 
a 0 - 1-31 a 1 - - 0-001 
y = 1-81 - 0-601,r 

^ |Ql 3l£-0 601 

7j — 20-42£“ 0 ' 601 


(a) 

(b) 


Fig. 17.4 shows the* values of // plotted against those of a\ The straight, 
line we have found cannot be described as a good fit, but so far as the eye 




820 


THEORY OF STATISTICS, 


Table 17.4.— Reduction of Non-linear Relation to Linear Form: Relationship 
between Working Costs per Unit and Number of Units Sold in 27 Electricity Under¬ 
takings, (Data from Return of Engineering and Financial Statistics, 1983- 34- 
Electricity Commission.) 


Name of Undertaking. 

Working 
Costs per 
Unit Sold 
(pence). 

Units Sold 
(excluding 
bulk 

supplies) 
per Head of 
Population. 

log 1 j 

log £ 





e. 

= Y. 

- X. 

LA. 

A 2 . 

Aberdare . 

1*53 

63*1 

0 18469 

1 8000 

0 3324 

3 2400 

llarrv U.D.C 

2 36 

12 1 

0 37291 

1 0828 

0 4038 

1*1725 

Hied bury ami Ronuley 

0 70 

394*2 

0 15490 

2*5957 

0 4021 

6*7377 

Chesterfield 

0 66 

220 5 

- 0*25181 

2 3434 

- 0*5901 

5*4915 

Earby 

1 41 

52 4 

0*14922 

1*7193 

0 2566 

2*9560 

Orange 

1*88 

119*1 

0*27416 

2*0770 

0*5(594 

4*3139 

Holmfirth . 

1*17 

181 6 

0*06819 

2*259 J 

0*1541 

5 1035 

Lincoln 

0 78 

293 8 

0 10791 

2 1681 

- 0*2663 

0*0915 

Mexborough 

M3 

170 1 

0 05308 

2 2315 

0*1185 

4*9796 1 

Nuneaton . 

0 86 

184 1 

- 0 06550 

2 2651 

0 1484 

5 1307 ' 

Redoar 

101 

68 0 

0 28103 

1 8325 

0*5150 

3 3581 1 

ttlaithwaite 

1*40 

80 7 

014613 

1 9069 

0*27.87 

3 6363 

Tanfield 

241 

| 29 0 

0 38202 

1 4624 

1 0 5587j 

, 2 1386 

West Lancs R.D.C. 

1 37 

53 4 ! 

0 13672 

1*7275 

1 0 2362 

1 2 9843 

Dumfries Corpomtion 
Tobermory 

110 

93*0 

0 04139 

1 9685 i 

1 0*0815 J 

! 3*8750 

4 21 

| 19*9 j 

0 62428 

1*2989 j 

I 0 8109 

! 1*6871 

I Aberayron . 

, 8*9 

25 6 I 

0*94939 

1*4082 

1 1 3369 

l *9830 

1 Brixham Oas and Elec¬ 

1 

! i 


! 


| 

tric tv>. . . . ( 

| 3*13 

30 4 j 

049554 

1*4829 ] 

0 7348 

2*1990 

Chudleigh Co. 

7 28 

16 7 1 

0*86213 

1 2227 1 

1 *0541 

1 4950 ' 

Foots Cray Co. . 

1*92 

77*8 | 

0 28330 

1 8910 1 

0 5357 

3 5759 1 

Lewes Co. . 

1 14 

120*1 1 

0 05690 

2 0795 

0 1183 

4 3243 1 

Newcastle ElectncLight 


i 


i 


i 

Co . 

0 04 

68-8 1 

0*19382 

1*8376 ' 

0 3562 

3 3768 | 

Ramsgate Co. 

1*67 

60*5 1 

019590 

J 7818 1 

0 3490 

3*1748 

Steyniug Co. 

1 06 

93*9 

0 02531 

1 9727 

0 0499 

3*8915 

West Devon Co . 

1*98 

22 1 

0 29667 

1*3414 

0*3988 

1*8074 

Coatbridge and Airdrie 







Co.. 

0*68 

196*2 

-0 16749 

2 2927 

- 0*3840 

5*2565 

Skelmorlie Co. 

2 05 

60 ] 

0 31175 

1*7789 

0 5546 

3*1645 

— --- - --- 

—. — — 


— . — 

— 

-- 

— _ - — _ 

Tot al 

~ 

— — _ — ^ 
I 

5*24928 

50*1311 j 

7*3008 

97 1450 

1 


can judge it is as good as any simple curve is likely to be. Il expresses 
the general relation between x and y ; but, naturally, local circumstances 
cause individual values to deviate appreciably from this relation. Statis¬ 
tical data which are not produced under laboratory conditions are very 
often of this nature. The fitted curve expresses a general trend, but 
individual eases may lie well away from it in a number of instances. 

Fitting of More General Curves. 

17.20. Example 17A ,—We must now consider the fitting of curves 
of order higher than the first. 

Table 17.5 shows the percentage loss of weight (F) for certain tem¬ 
peratures (X) in experiments on the oven-drying of soils. Since X is 





SIMPLE CURVE FITTING. 321 


here the controllable factor, it is natural to take it as the independent 
variable, and we shall express Y in terms of X . 



Logarithm of number of units sold per head ofpopulation 

Fig. 17.4.— Straight line fitted to Logarithms of Data of Table 17.4. 

The data are shown graphically in fig. 17.5. We shall find successively 
the straight line, quadratic parabola and cubic parabola of closest fit. We 
shall therefore require sums of powers of X up to S() and sums of 
products up to S(FA' 3 ). We also require, for later work, S(F 2 ). 

The preliminary work is shown in Table 17.5. We might, perhaps, 
have abbreviated the arithmetic slightly by taking an origin of cc at 
A r - 100 and of y at F — 3, but the saving would not have been large. 
Data of this kind frequently give rise to large figures in the higher sums, 
and a machine is a great help in the calculation. For instance, with a 
machine the sums S(FA'), etc., can be found by continuous addition, 
without the necessity for writing each individual contribution in the 
relative column. 

For the straight line of closest fit, equations (17.8) become: 

82*97 - 16a 0 ~ 2(H2a 1 0 

14,736-19 -2G42tf 0 - 474,050^ -0 

whence 

a 0 - 0*660 and a x -- 0*02741 
(more accurately, 0*659,759,789 and 0*027 408,722) 
and the straight line is: 

y -- 0*660 -f 0*02741# (a) 

For the quadratic parabola, equations (17.8) are: 

S(F) -na 0 -cqS(A) -a£(X*)=0 
S(FA) -a 0 S(A) -a 1 S(A 2 ) -a 2 S(A 8 ) ~0 
S(FA 2 ) -a 0 S(A 2 ) -a 1 S(A 8 ) -a*S(A 4 ) -0 


21 



Table 17.5.— Curve-fitting to express the Relationship between Temperature and Percentage Loss in Weight of Certain Soil Samples . 
(Data from J. R. H. Coutt^. “‘Single Value' Soil Properties: V. On the Changes Produced in a Soil by Oven-drying,” Journal 


THEORY OV STATISTICS* 


iHOooHocwcowcfiHnoo'iU' 

)«©«ocor*Hnot-WH©i'.<M 

> »o r- o co on oh as ^ © »o r-* cq co oo 
‘c I'l'T ici o Cf" cf •*£ >D CO acf -h —f © 

t H m t' fH IO OlO r J0 00 03 ® (N I” ^ H 

.^HaofflOH^coawowoi'- 


S iOC.iOCDO'^OiL-iO^CO'rHClOCOcD 
oaooiccaoc-jiococoqic^'^^coi-' 

S r c ^ -30 o -t 1 ^ h o 6 M io o I ' i- 
C Cl't'QOOJt'l-'-tOlMQOr-iOlOO 
o l- t 05 Oi -H iO «O k 't O O r OO QO 

t- cf © >-1 i^co of o io —T of co' of cd i"* c© 

« ^ rtno io oo o ci i- c ^ a ic h ao 

H M H Cl (M (M CO ’t ^ 

g in o »D CD C CO CO <35 »0 rt< Cl Cl CO GO CO 

l>- C M 'C b O W a 1 - »0 Ci -+< OC I - CD ^ 

«^’#^Tttiococci'-C30cic:»':i-'03 


f o o I'i rt ^ o a c h © ^ to c 

Cl © Cl 01 C 1 10 Cl © 01 -* 0-1 00 l- c 

© c cd o c ci oj c © cd cc « a c 

c* c 3 i-'* cr' cc co h* ci of cf - 


: r -I c a. Cl C M C CO cc or ?C(M I 

■ a cr x ci ic c c cr> h ci cc o i i 

: c ic c Th or *-« q? « m w « ci^o i 

o' <h ccoc* o' cd* if 10 * -t -h c >0 -f c 

: t- —, co /■ ■—i ci io ex *0 ac co ' 

- co t— co —■ fi c i/ i' yi looi-ci ( 


: r-ci^wcoci<HcccicDt«- 
- C- CO 01 O ^ os 10 '*t< CO t" 10 I 

■> «© -r o 4 cs oc oi ac cc cr < 

« o i ^ o r- © f-4 f-f rh i - ^ ( 

■ ci rt c m h a * or io o C i 

i^coeo«-jcocnosoor-H©4i-< 
: i i r-f co o ■*+ © c i os < 

i cc i- h ^ o co a « rt i' Cl i 

h o c oj ao c i - i - e ^ I- < 

' io c~ r-f co o co -*t co cs i-’ < 

1 <m rtt co oo <— on o •*+ c i co ■«* . 

H m ci co *t 1C l' ( 


Oi:Ci' , f -©Ohhhhh©C0hh 
sr^i^ciont-r. anccccconcci-.©© 

c - - r x l - c c 11 - © r: cc *h t- io <5 

c C Z- v> ‘0 -4 < —f i-’ CO ^ CO 1-^ CD CD 

CO— C CT jr co f-H Cl CD QO CD O »D Cl 

C ^ Hi CS CO O c- Oj Cs CD X r- C. I - OS »—I 

c* -T c* -+’ -r -o of i- o cd* cT oo*crs co Th of 

O Cl it L- H C Cl ’f O Ol CO a H C IO CD 
rt«HrtCKO^lCl-C«n£)ODHO 

-H <-f h ci cf cf co 


—i oc h i-- t*> a ri r- <r c> co*h 
I ' O C (/) l-"+ !T0 l' (N Cl i - »o ID 

uf_ r j. 03 if. I' « cc -t h rH c ci 

' c’ —cf -h o' o' I-' o X*co' 0 I co' 

ci i - c r /s co co cc O ci ii n h 

; IO 1- Cl D »C W l» © « »0> CO 00 

-h —* cf ci co •*+ o' co' oo os —I cc vcf 


O o on^itCCiariHO»HfO! 
c :i o oi h ci c: ; co »ti ic r 4 |. t 
O C H Cl CD ^ 1' O 1C O (M D o !■ 


r-tHCacCOCOHCMDCCDHMUDcO CO 

r cc a r ^ o if: c ci cm o o o w h © 

cc h a o r h :o ^ i' ci *|i d oo ci io h co 

I- O yo ^ CD CD X CO CH co O O CO CO <M rH 

CO ^ ^ io io L- GO O 01 00 Cl !•- Cl 00 ID O OS 

r< -iHF-IHMH(NDlCKOCO*tT}UOCO IO 


: lO—'ClTHCOCOCSHCOClCDt^*-1 

i m ci « o u: o q c h ci n in 

H>—!■—«,—i OQ Ol Cl 


COCOCOCOCO^TfVrl<lDiDCDCDCDL-l'- 





SIMPLE CURVE PITTING. 


323 


These become, on substitution, 

82*97 — 16a 0 -2642rq - 174,050rt 2 -0 
14,786-19 --2642<i 0 - 474,050flf J - 91,214,582tf 2 -0 
2,819,909-15 -474,()50</ 0 -91,244,582^ - 18,558,104,842tf 2 =0 

giving 

a 0 - 3*551, a x ~~ - 0-009291, a 2 = 0*00010695 
(more accurately, 8-550,990,2, -0-009,291,235,7, and 0-000,106,954,12) 

and the parabola is: 

y =3*551 ~0*00929Lt 4 0*00010695a? 2 . . ( b) 

For the cubic parabola, equations (17.8) are : 

S(T) -wa 0 -a 1 S(A r ) -« 2 S(A’ 2 ) a 3 S(A' 3 ) --() 

S( YX) - « 0 S(A r ) - ajS(X 2 ) -r/ 2 S(A’ 3 ) - « r S(A' 4 ) - 0 
S( VA' 2 ) -r/ 0 S(X 2 ) - tf,S(A' 3 ) -<7 2 S(A 4 ) -aS(A' 3 )- 0 
S( FA 3 ) - a 0 S(A 3 ) - % S(A 4 ) - a 2 S(A' 5 ) -^S(A*) 0 

which become : 

82 07 J <>a 0 - 2042^ - 474,050a 2 - 01,244,582a 3 =*( 
1 1,780 19 - 2042a,, 47 1,050a, 01,24.1,582a, - 18,558,104,84.2a, =( 

2,810,000 15 - 17 4.,O5Oa 0 - 91,2 41,582^ 18,558,104,842a> - 3,080,204,225,802a 3 =( 

,002,802 11 01.244,582a 0 - 18,558,10 4,842a, -8,080,20 4,225,802a, -858,077,008,755,250a 3 =( 

It is not really neecssary to write out the large numbers of the later 
equations as fully as we have done, and a certain amount of approximation 
is allowable. The student should, however, be careful not to introduce it 
too soon, as neglected quantities may become of cumulative importance 
in the solution of the equations. 

fjy straightforward but rather strenuous arithmetic we find : 

a {) — 7*783, a x - -0-08940 

(U -0*0005875, a. ~ 0 0000000180 

(more accurately, a 0 -7*782,526,861. a x - 0 089,402,305,60 

a 2 **0 000,587,470,23 4,2, r/ 3 - -0*000,000,918,801,060,8) 

The smallness of the coefficients a 2 and does not mean that they are 
of minor importance, since in the equation for y tliey are multiplied by 
terms in of 2 and a ’ 3 , which may be large. 

The cubic parabola is, then, 

y — 7-783 - 0 08040a’ +0*0005875a; 2 - 0-0000009189or* 
which wc may also write as : 

Fig. 17.5 shows the data graphically, with the straight line and cubic 
parabola of closest fit. 



324 


THEOKY OF STATISTICS. 



Fig. 17.5.—Straight Lino and Cubic Parabola of Closest Fit to the 
Data of Table 17.5. 


17.21. Although a graph will usually suggest whether a straight lino 
or quadratic* parabola is likely to give a satisfactory lit, it will not as a rule 
be much guide in deciding whether further ternrs will lepay the labour 
of calculation. This can he judged, at least loughly. by calculating 
the terms given by the polynomial (to as high a degree as it has been 
carried) for the observed values of *r, and then observing the run of the* 
residuals. Ff the signs run more* or less at random it will hardly be 
worth while to calculate another term ; but if a series of positive residuals 
is followed by a series of negative residuals, these by another scries of 
positive residuals, etc., it will probably be worth while to proceed further. 
Moreover, the coefficients for a parabola of order k are no guide to those 
of order k1 . For instance, m Example 17.1, the values of a 0 for the 
straight line, square parabola and cubic parabola are (HK>(), 8*551, 7*788 ; 
and those of are 0*02741, - 0*000201, -0*0894.0. From this informa¬ 
tion we could not guess even the sign of these coefficients in the parabola 
of order 4, and it we wished to lit such a curve five equations of the type 
(17.8) would have to be solved nb initio . 

The student, therefore, should not fall into the error of thinking that 
parabolas of successive orders will resemble each other in their lower 
terms, or that the fitting of a curve of order k + 1 is merely a question of 
adding an extra term to a curve of order /»*. It would be a great con¬ 
venience if this were so, and,m fact, methods have been devised whereby 
one variate can be expressed in terms of certain polynomials of the other 
in such a way that this advantage is secured. The theory of these 
so-called “ orthogonal " polynomials is, howev er, outside the scope of 
the present work, and we would refer the student who is interested to 
the references for this chapter. 




stmfIjE curve fitting. 325 

The Case when the Independent Variable Proceeds by Equal 
Steps. 

17.22. When the independent variable x proceeds by steps of equal 
amount h 9 the arithmetical solution of equations (17.8) can be greatly 
simplified, particularly if the number of values is odd. In such a ease 
we take h as the unit of x and an origin at the middle term. The values 
of x will then be -- k, - (k - 1 ), -(/c -2). ... -2, - 1, 0 , 1, 2 , . . . 
(ft-2), (k - 1), ft, and owing to the symmetry of this series the sums of 
odd powers of x will vanish, i.e. S(-Y), S(V 3 ), S(V 5 ), etc. are all zero. 
Equations (17.8) then become, taking p as odd, 

S (F) ~na 0 -r/ a S(-Y 3 ) -* 4 S(Y*) 

S(V-Y) -*,S(-V*) -</,S(.Y‘) . . . 

S( YX V 1 ) -a 0 S(X J> l ) -a, S(A^ ] ) . . . 

S(FX") -fl 1 S(A> 11 ) -^S(.Y"> 3 ) . . 

and not only is the number of terms reduced, but the equations split 
into two sets, one m r/ 0 , a 2 , a l9 etc., and the other in a l9 a i9 r/ 5 , etc. More¬ 
over, the sums of e\cn powers of X are twice the sums of powers of the 
first k natural numbers, which may be easily found, either from tables 
or from know r n formula*. 

Example 17. J. Table 17.6 shows the population of England and 
Wales in certain census years from 1811 onwards. Taking the time as 
the independent variable, wo choose as the unit of X the period of ten years, 
and the origin at the mid-point of the range, 1871. The preliminary work 
for the fitting of runes up to the cubic form is shown in the table. 

For the cubic parabola, equations (17.8) are, then, 

.31109 1 :ia 0 -18LV/, 

171*77 -18 2a! - 1550a, 

1520-15 - 182tfo 1550<7 2 

11,682-97 - 1550(7j - 1.34,8 V2a, 

whence 

<7 0 = 28-299 «, 2*805 

a 2 0-06158 a. 0*011 i-7 

The parabola is, therefore, 

y -- 28-299 + 2-895^ + 0-0615&r 2 - 0*01147<r 3 . . ( a) 

Fig. 17.6 shows the data graphically, together with this cubic. 
Incidentally, this example illustrates one point of some importance. 
Over the years 1811 to 1981 the eubic gives a fair fit, and might be used 
to estimate the population at intermediate years. But for extrapolation 
it is of very little value. We could not estimate the population for 1951 
with any confidence by putting x - 8 in the cubic ; still less that for later 
years. Unless there are good reasons for supposing that the fitted curve 
is an accurate representation of a theoretical relationship, it is dangerous 


0 

0 

0 

0 




326 THEORY OF STATISTICS. 

to assume that a fitted parabola can be used outside the range for which 
it was ascertained. 


Table 17.6 .-Curve-fitting to Growth of Population in England and JFn/c.v. (I)ata 
from Resistrar-Generars Statistical Review of England and Wales. l.Md, 
Tables, Part II.) 



Population 









Year. 

(millions) 
Y . 

X. 

X 2 . 

** 

X 4 . 

X\ 

YX. 

YX 2 . 

YX*. 

1811 

1016 

- 6 

36 

216 

1,296 

46,656 

- 60-96 

365 76 

- 2,194-56 

1821 

1200 

-5 

rt- 

125 

625 

15,625 

- 60 00 

300 00 

- 1,500-00 

1831 

13-90 

-4 

16 

64 

256 

4,096 

- 55 60 

222 40 

- 889*60 

1841 

15 91 

-3 

9 

- 27 

81 

729 

47-73 

143 19 

- 429 57 

1851 

17-93 

- 2 

4 

8 

16 

64 

35 86 

71 72 

- 143 44 

1861 

20 07 

- 1 

I 

1 

1 

1 

20 07 

20-07 

- 20 07 

1871 

22 71 1 

0 

0 

0 

1 ~ ' 

— 

1 


— 

1881 

26 97 

1 

I 

I 

1 

1 

25-97 

25 97 

25 97 

1891 

29-00 

<> 

4 

8 

1 10 

64 

58 (X) 

116 00 

232 00 

1901 

32 53 

1 3 

9 

27 

81 

729 

97 59 

292 77 

878-31 

1911 

36 07 

4 

16 

64 

256 

4,096 

144 28 

577 12 

2,308 48 

1921 

37-89 

r t 

25 

125 

625 

15,625 

189 45 

947 25 

4,736 25 

1931 

39 95 

6 

36 

216 

|1,296 

46,656 

239 70 

1,438 20 

8,629-20 

Total 

314 09 

1 0 

l 

j 182 

i '» 

4,550 

j 134,342 

1 

474 77 

4,520 45 

11,632-97 



Fig. 17.6.—Cubic Parabola fitted to the Data of Tabic 17.6. 

It would be instructive for the student to fit merely a segment of some 
actual series and note how rapidly the curve calculated from the segment 
diverged from the observations outside its limits. It has been shown that 
even within the limits of the fitted observations the fit tends to be worst 





SIMPLE CURVE FITTING. 


827 


as the limits are approached. The higher powers of x become of greater 
and greater effect the more we diverge from the centre of the fitted 
segment and tend, so to speak, to “ wag the tail ” of the curve. 

17.23. If the number of values of «r is even, we have a choice of two 
methods of procedure. We can lake h as unit and the origin at one of 
the two middle values; or we can take Ih as unit and origin midway 
between the two central values. In the first ease, the sums of odd powers 
will no longer vanish, but they will nevertheless be easily calculable, 
since all terms except a single outlying member in the summation will 
cancel out in pairs. In the second case the sums of odd powers will 
vanish, but the other sums will no longer be twice those of the first k 
natural numbers, but of the first k odd numbers. In either case the solution 
of the equations (17.8) is not dilficult. 

Calculation ot the Sum of Squares of Residuals. 

17.24. The eye is nol a reliable guide to the closeness with which a 
given curve lies to data, and it is desirable to have some more accurate 
measure of the closeness of fit. For tins purpose we require to be able 
to find the sum of the squares of residuals U. We know by our method 
of ascertaining the curve that this will lie less than the corresponding 
quantity for any other curve of the same degree, and our interest is centred 
on how close tins is lo the ideal value zero. 

To calculate the sum of squares of residuals it is not necessary to 
calculate each separate residual. In fact, for the parabola of order p we 
have : 

II S(F-// 0 -^Y - a 2 X 2 - . . . -a p X*) 2 
. . . -a^x*)} 

for the terms of the type S{tf A A r/ (F ~a Q -a x X - . . . ~a v X v )} vanish in 
virtue of equations (17.8). lienee, 

u -.s(y 2 )-« 0 s(i r )-aiS(rA’)- ... -a p s(YX*) (17.13) 

The constants a and the sums which appear in this expression have 
already been found, with the exception of S(F 2 ) in some eases. With 
this additional quantity we can find IT. 

Example 17.0.- Let us find V for (he data of Example 17.4 for the 
straight line and the tvo parabolas. 

For the line 

U = ^Y 2 )-a 0 S(Y)~afi(YX) 

Here 

S(F) -82-07, S(FAT) =-14,730*19 
S(F a )«450-4863, a 0 -0*059,759,789 
a x =- 0*027,408,722 

Hence, 

U - 459*4303 - 54-74027 - 403*90014 


0*7959 



328 


THEORY OF STATISTICS. 


For the quadratic parabola : 

U=S(F 2 ) -afi(Y)-afiiYX) ~a 2 S(FZ 2 ) 

and here 

a 0 - 8*550,990,2 

a x -0009,291,285,7 

a 2 - 0*000,10(5,954,12 

whence 


U —0*1271 


Similarly, for the cubic 

ZJ =0*0485 


The value of ZJ therefore decreases from 0*7959 for the straight line to 
0*0485 for the cubic. This is what we should expect, lor the addition of 
extra terms means that we have additional constants at our disposal in 
the task of minimising ZJ. 

To obtain IT with any accuracy by the foregoing method it is necessary 
to ascertain the as to a considerable number of decimal places. 

Measurement of the Closeness of Fit. 

17.25. The value of ZJ enables us to make some sort of comparison 
between the fits of different curves to the same data ; but it is not, in itself, 
a satisfactory measure of fit, since it does not permit of the comparison 
of the fits of curves to different data. The measure IJjn , which is the 
variance of errors of estimation, suggests itself, but this, like U, is not 
absolute, being dependent on the units in which we are working. For a 
satisfactory measure some form of ratio would have to be taken. 

Such a ratio arises in a natural way if we consider the correlation 
between the actual values of Y and those “ predicted ” by the polynomial. 

Let us, without loss of generality, suppose that the values are measured 
from their mean, and let y, be the value given by the polynomial and l r r 
be the actual value. Then, as in 17.24, 

S(?y 2 ) -S(F//) .... (17.14) 

r7-S{F(F~?/)} 

- S(F 2 )-S(F„) . . . (17.15) 

Writing cr y , a v for the standard deviations of F and y, and H for the 
correlation between them, we get, from (17.1 t), 

Oy /l (J l (Jy 

or 

°,j = R°y .... (17.16) 

and from (17.15), 

or 


. (17.17) 



SIMPLE CURVE FITTING, 


329 


Hence, substituting for a y from (17.16), 

. . . (17.18) 

ncr> 2 

which gives the correlation in terms of the ratio of ZJ/n and the variance 

ut 2 . 

R is, in fact, analogous to the multiple correlation coefficient and the 
correlation ratio, and the equation (17.18) should be compared with 
equation (13.8), page 244, and equation (14.15), page 278. 

Example 17.7 .— In Example 17.1 we have, using the data of Table 17.2 
and the constants found : 


Gy 2 --67*28998 -(6-126)* 
= 29-762,104 
V -1 835,777,255 


R 2 -1 


1-835,777,255 
297-62104 


-0-993,831,830 


R -0-99691 


For the soil data of Examples 17.1 and 17.0 we tind : 

For the straight line R -- 0*98027 
For the cubic R = 0*99917 

Thus, judged by the value of /i, the straight line of Example 17.1 is a 
better tit than that of Example 17.1, but a worse lit than the cubic of the 
latter. 

17.26. As a general comment on the scope of the methods of curve- 
fitting described in this chapter, we may remark that although polynomials 
can always be litted to data, the student should not assume that even the 
polynomial of closest lit will necessarily be a satisfactory lit. It may 
exhibit peculiarities of behaviour wlueli are entirely absent from the data 
themselves. He may well ask, when confronted by a given set of data, 
how he is to know whether they may be satisfactorily represented by a 
polynomial. The answer is that lie must Jit one and sec. Some further 
remarks on this point are given later in 24.12, where similar questions 
arise in connection with interpolation and graduation. 


SUMMARY. 

1 . A parabola of the form y + a l (? +a 2 x 2 -t . . . +a 1) x p may be 
litted to data by choosing the constants a so that the sum of squares of 
residuals ZJ = S (F -a 0 ~a t X - a 2 X 2 - ... a if X J ') 2 is a minimum. 

2 . This method leads to the equations 

S(F) ~??a 0 - a t S(X) -a 2 S(X 2 ) - . . . -<z p S(A r *) -0 

S(FA r ) -a 0 S(X) - fll S(A*) -* 2 S(X 3 ) ~ 

S(FX^)-^ 0 S(A^)-« 1 S(A^ 1 )-a 2 S(A^ 2 )~ . . . ~a p S(X* fi ) -0 



880 


THEORY OF STATISTICS. 


8. Non-linear data may sometimes be reduced to the linear form by a 
simple transformation of one or both the variables. 

4. The sum of squares of residuals may be found from the formula 

U -S{Y*) -a 0 S(Y) ~ ai S(YX) - . . . -a p S(YX p ) 

5. One measure of the goodness of lit of the parabola to the data is 
given by 22, the correlation between actual and “ predicted ” values of the 
variate. U is given by 


where Y is the dependent variable. 


EXERCISES. 

17.1. Fit a straight line and parabolas of the second and third orders to the 
following data, taking A’ to be the independent variable - 


A\ 

V. 

0 

1 

1 

1-8 

2 

18 

3 

2 5 

4 

0 3 


and find the sum of squares of residuals in the three cases. 

17.2. (Data quoted by P. L. Fegiz, fc ‘Le variazioni stagionali della natality,” 
Mclroriy vol. 5, 1925, No. 4, p. 127.) The following figures show the relation 
between duration of marriage and average number of children per marriage in 
Norway in 1920 : - 


Duration of Marriage 
(Years). 

0 - 1 
5- 0 
10-11 
15 16 
20 21 
25 26 
30 31 


Average Number of 
Children. 

0 48 
2 09 
3-26 

4 33 
514 

5 63 
5-77 


By the method of least squares find equations of the first, second and third 
orders expressing the number of children in terms of the duration of marriage. 
Compare the values given by these expressions for a duration of 17-18 years 
with the true value 4 67. 

17.3. The pressure of a gas and its volume are known to be related by an 
equation of the form pv y - constant. 

In a certain experiment the following volumes of a quantity of the gas were 
observed for the pressures specified. Find the value of y by fitting a straight 
line to the logarithms of p and v, taking p to be the independent variable. 

p (kg. per square cm.) . 0 5 1 0 1 -5 2 0 2-5 3 0 

V (litres) . . . 1-62 1 00 0 75 0-62 0-52 0-46 


17 4. (Data from the records of the Farm Economics Branch, School of 
Agriculture, Cambridge, England.) 



SIMPLE CURVE FITTING. 


831 


The following arc the gross output and the gross output per £100 of labour 


Gross Output 
(Units). 

Gross Output 
per £100 Labour 
(Units). 

03 

40 

223 

155 

755 

188 

105 

78 

1535 

315 

3103 

290 

2238 

250 

1228 

231 

2005 

255 


Fit a quadratic* parabola to these data, taking gross output as the independent 
variable. 



CHAPTER 18. 


PRELIMINARY NOTIONS ON SAMPLING. 

The Problem. 

18.1. In practical problems the statistician is often confronted with 
the necessity of discussing a universe of which he cannot examine every 
member. For example, an inquirer into the heights of the population 
of Great Britain cannot afford the time or expense required to measure 
the height of each individual ; nor can a farmer who wants to know what 
proportion of his potato erop is diseased examine every single potato. 

In such eases the best an investigator can do is to examine a limited 
number of individuals and hope that they will tell him, with reasonable 
trustworthiness, as much as he wants to know about the universe from 
which they come. We are thus led naturally to the question. What 
can be said about a universe of which we can examine only a limited 
number of its members V This question is the origin of the Theory of 
Sampling. 

18.2. A sample from a universe is a selected number of individuals 
each of which is a member of the universe. As a very special case the 
sample may consist of the entire universe. 

It is a matter of common belief, founded on experience and intuition, 
that a sample will tell us something about the parent universe. The 
corn merchant, whose livelihood depends on lus ability to ascertain 
the quality of the grain which he handles, is content to assess it by thrust¬ 
ing a conical trowel into the middle of a sack and scrutinising the sample 
he gets. He believes that the sample will be representative of the whole, 
and experience justifies him. He buys and sells on the basis of judgment 
from samples. 

It is also a matter of common belief that the larger a sample becomes 
the more likely it is to reflect accurately the conditions in the parent 
universe. 

To these and similar beliefs the theory of sampling gives a logical 
basis and a system of quantitative measurement. In this chapter we 
give a general survey of the fundamental ideas and the technique of 
sampling. In later chapters we shall develop these ideas and discuss their 
applications in various iiekls. 

Types of Universe. 

18.3. Before we consider sampling itself, however, it is desirable 
to look a little closer into the various types of universe which we shall 
have to im estigate. 

By a finite universe we shall mean a universe which contains a 
finite number of members. Such, for instance, is the universe of inhabi¬ 
tants of Great Britain and the universe of books in the British Museum. 

332 



PRELIMINARY NOTIONS ON SAMPLING. 833 

Similarly, by an infinite universe we shall mean a universe containing 
an infinite number of members. Such, for instance, is the universe of 
pressures at various points in the atmosphere, or the universe of 
possible sizes of the wheat crop in tons, for, although there are limits to 
the size, the actual tonnage can take any numerical value within those 
limits. 

In many eases the number of members in a universe is so large as to 
be practically infinite. Moreover, a theoretical discussion of an infinite 
universe is frequently easier than a discussion of a finite universe, and a 
large class of problems may be treated by assuming that the parent 
universe is infinite, without introducing any sensible error. 

It may be worth remarking that in a few eases we may be ignorant 
whether or not the universe of discussion is infinite. The universe of 
stars is an example. 

Existent and Hypothetical Universes. 

18.4. By the logical extension of the idea of a universe of concrete 
objects, which we shall call an existent universe, we are able to construct 
the idea of a hypothetical universe. 

Consider the throws of a die. Each throw will be regarded as an 
individual. There is an infinite number of throws which can be made 
with the die, provided that it does not wear out. Let us then define as 
our universe of discussion all the possible throws of the die. 

In doing so we are elearl> making some new step ; for our universe 
is to be conceived as ha\ing no existence in reality but only in imagination. 
We can give actuality to some members of the uimersc by throwing the 
die, but we can ne\cr produce them all. Even if the die were locked 
away in a safe and ne\er thrown at all there would still be a universe 
of possible throws. 

Such a um\ersc is called a hypothetical unhorse. We may define 
it formally as the aggregate of all the conceivable ways in which a specified 
event can happen. Ollier examples of hypothetical universes are the 
universe of all values winch the bank rate can have in ten years' time, 
and the universe of the possible ways in which three balls can be arranged 
on a billiard table. 

18.5. A hypothetical universe may, in fact, be imagined around 
any observed event. We have only lo picture all the circumstances 
before the event happens; the universe is then all the possible ways in 
which it could happen. Which of the ways it xvill happen docs not affect 
the universe. We know that from the chaos of predestination and 
the night of our forcbcing ” some one individual will emerge to assume 
the mantle of reality ; but which one that will be is another and more 
difficult question. 

18.6. The student of metaphysics would perhaps criticise the 
thoughts expressed briefly in the previous two sections, but we have no 
space to go further into the philosophical implications of the idea of 
hypothetical universes. The problems which arise m this connection 
have, however, far more than an abstract interest. They lie at the root 
of a great many practical statistical problems, and most students, however 
utilitarian their outlook, w r ill find thal a clear perception of the issues 
involved may save a lot of thought and labour at a subsequent stage. 



334 


THEORY OF STATISTICS. 


The literature on this subject, unfortunately, is scattered; but reference 
may with advantage be made to the works cited in refs. (388)-(390). 

Universe of Universes. 

18 . 7 . Just as a universe may contain a number of sub-universes, 
so any given universe may be a member of some more widely defined 
universe. For example, tiie universe of inhabitants of Great Britain is 
a member of the universe of universes, each of which consists of the 
inhabitants of some European country. 

Similarly, any existent universe may be regarded as one member of a 
hypothetical universe of universes. For instance, the normal universe 
of men whose heights have a mean of 65 inches and standard deviation 
3 inches is a member of the hypothetical mmerse of all populations 
which are normally distributed with respect to height. 

18.8. We shall sometimes have to discuss aggregates which it is 
difficult to regard as composed of individual members at all —for example, 
we may wish to sample a reservoir of water to test for pollution. In 
theory, perhaps, we could in such a ease regard the reservoir as a universe 
composed of molecules each of which was an individual, but in practice, 
as we shall see, this is not usually a convenient method of approach. 
Such universes may frequently be treated as composed of arbitrary units, 
c.g . the reservoir may be regarded as composed of so many pints of fluid. 
Similarly, a 280-lb. sack of flour may be regarded as composed of 4 480 
ounces, and we can, if we like, regard it as weighed out into one-ounce 
packets. 

18.9. We can now turn to discuss the aims which usually underlie 
a sampling inquiry. 

Briefly, the fundamental object of sampling is to give the maximum 
information about the parent universe with the minimum effort. We 
must, therefore, consider the type of information we require and the 
methods by which it is to be obtained. 

18.10. In sampling a universe we usually have in mind one or more 
of its variates. For instance, when we sample the population of Great 
Britain, we an* not so much interested in the individuals as human beings 
as in one of their qualities, such as height or w eight, or perhaps the correla¬ 
tion between height and weight. Our object will then be to get, from the 
sample, an idea of the frequency-distribution m the parent universe 
according to the chosen variates. 

The ideal for this purpose would be to express the distribution in some 
mathematical form such as a Pcaison curve (10.48). It may be, however, 
that the parent universe will not admit of this representation, or that the 
sample is not large enough for us to venture on it with any confidence. 

In such eases we attempt to find estimates of certain constants of the 
parent universe. Very often this is all we need. We can, for example, 
form a very fair idea of the height distribution of the population of Great 
Britain if we know' the mean and the standard deviation. If we can go 
further, and find the third and fourth moments, our idea will be better still. 

Theory of Estimation. 

18 . 11 . Hence, a large part of the theory of sampling is devoted to 
finding from the sample estimates of certain constants of the parent 



V RE LI MIN A R Y NOTIONS ON SAMPLING. 335 

universe. Such constants include the measures of position and of dis¬ 
persion, together with the moments and measures of skewness; and, in 
multivariate universes, the various total and partial correlations. 

In general, there are more ways than one of estimating a constant from 
the data of the sample. Some of these ways will be better than others. 
The Theory of Estimation treats of these and cognate matters. It 
seeks to investigate the conditions winch an estimate should obey, what are 
the best estimates to employ in given circumstances, and how good other 
estimates are in comparison. 

Precision of Estimates. 

18.12. It will be obvious ihat knowledge derived from a sample is not 
of the categorical kind customary m mathematics, if we have 1000 halls 
iri a bag and draw 999 of them which turn out to be black, it is always 
possible that the remaining one is of some other colour. It is, however, 
so improbable, that in most practical eases we should be justified in con¬ 
cluding that the balls were all black. 

If we did draw such a conclusion, and acted upon it, wc should be basing 
our action, not upon certainty, but on probability. One does this kind 
of thing, of course, m nearly all everyday actions almost without noticing 
it. Some events, such as the death of a man before reaching the age of 
150, have such a high degree of probability that we never regard them as 
other than certain ; other e\ ents, such as the possibility of ram to-morrow; 
are so uncertain that we should hesitate to make an important decision 
contingent upon them. 

18.13. The second aim of tlie theory of sampling is, then*fore, to 
determine as objectively as possible what degree of conlidenee we can put 
in our estimates when they are obtained. This w r e do in terms of prob¬ 
ability as far as wc can ; if this proses impossible, we sometimes hu\e to 
rely on intiutive impressions or the results of previous experience, which 
are not expressible in (juantdative terms. 

Put in another wav, we may say that our object is to determine the 
precision of an estimate. We attempt to do this by assigning limits to 
the probable divergence between the estimate based on the sample and the 
true value of the estimated quantity in the universe. 

18.14. The accuracy of the estimate wall depend on (a) the way m 
which the estimate is made from the data of the sample, and (b) the way 
in which the sample w r as obtained. Consideration of the first leads us 
again to the theory of estimation. The second leads us to study the 
technique of sampling and the design of statistical inquiries. 

Tests of Significance. 

18.15. If the sample is small we cannot, as a rule, assign to the 
estimates we obtain sudieientl) narrow limits to locate the universe value 
with any serviceable accuracy. For example, a correlation of 4 0*5 in a 
sample of twelve might arise, rather infrequently, from a normal universe 
in which the true correlation w'as as high as 4-0*9 or as low as zero. For 
such samples our questions are accordingly framed in more qualitative 
terms : we do not ask, “ What is the value of the correlation in the 
universe ? ” but, “ Is the observed value significant of the existence of any 
correlation at all in the universe, whatever its value ? ” In other words. 



836 


THEORY OF STATISTICS. 


we wish to know whether the observed value could have arisen from a 
universe in which the true correlation is zero. If our conclusion is that it 
could not, we may say that the sample value is significant of correlation, 
although we cannot say with much confidence what that correlation is. 

Much of the investigation arising out of small samples is thus of a rather 
special character, and deals with tests of significance. The methods 
developed for the purpose of conducting such tests can be, and not in¬ 
frequently are, applied also to large samples, either alone or supplementary 
to the direct approach of forming more or less precise estimates of the 
various quantities which specify the parent universe. 

Types of Sampling. 

18.16. The process of forming a sample consists of choosing a pre¬ 
determined number of individuals from the parent universe. The choice 
may be exercised in three ways : 

(a) By selecting the individuals at random (the meaning of “ random ” 
is discussed below). 

(b) By selecting the individuals according to some purposne principle, 

(c) By a mixture of (a) and (b). 

Thus, in taking a sample of the inhabitants of Great Britain to study 
their income we might, according to method (a), select the individuals 
at random from census returns; or according to (b) we might, knowing 
roughly the average incomes in various age-groups, purposely select from 
each group an individual whose income was somewhere near the average in 
that group ; or (c) we might decide to take ten individuals from each group 
and select those ten by method (a). 

18.17. Sampling of type (a) is called random sampling. Thai of 
type (h) is called purposive sampling. That of type (r) is sometimes 
referred to as mixed sampling. If the universe is divided into “strata” 
by purposive methods and then a portion of the sample is taken from 
each “stratum,” the sampling is said to be stratified. 

The application of each of these types may be affected by what is known 
as bias. This is the name given to perturbations which influence the 
nature of the choice and make it something other than what the experi¬ 
menter intends it to be. Bias may be due to imperfect instruments, the 
personal qualities of the observer, defecti\e technique, or other causes, 
lake experimental error, it is difficult to eliminate entirely, but usually 
may be reduced to relatively small dimensions by taking proper care. 

By an obvious extension of the nomenclature, we talk of a sample 
obtained by random sampling as a random sample, that obtained by 
purposive sampling as a purposive sample, and so on. 

Random Sampling. 

18.18. The reader no doubt already lias some intuitive ideas about 
randomness of choice. VVc inav give a formal definition of random 
sampling by saying that the selection of an individual from a universe is 
random when each member of the universe has the same chance of being 
chosen. Similarly, a sample of n individuals is random when it is chosen 
in such a way that, when the choice is made, all possible samples of n have 
an equal chance of being selected. 



PRELIMINARY NOTIONS ON SAMPLING. 337 

18 . 19 . The first question arising out of this definition which we have 
to consider is : How are we to obtain a random sample ? 

This question is more difficult than it appears at first sight. It might 
be thought that any purely haphazard method of selection would give a 
random sample. For example, if we wished to obtain a random sample of 
local tradesmen, one way which suggests itself is to take a Trades Directory, 
open it “ at random ” and take the first name on which the eye alights, 
repeating the process until the sample is of the required size. Or again, 
if we wished to obtain a random sample of wheat growing in a field, it might 
be thought that a satisfactory method would be to throw a hoop in the air 
“ at random ” and select all the plants over which it fell. 

18.20. That such methods arc apt to be deceptive may be seen from 
the two examples we have just given. In the first, if we consulted a Trades 
Directory which had already been used, we should probably find that it 
opened at some pages more readily than at others ; we should therefore 
tend to get the more popular tradesmen. Moreover, our eye might tend 
to be caught by long names or peculiar names. In either ease some trades¬ 
men would have a greater chance of being chosen than others, and the 
sample would not be random. 

Again, in the second example, our hoop might tend to be caught by the 
taller cars of wheat, or we might tend unconsciously to throw it towards 
parts of the field where the wheat looked to be about the average height. 
These and other factors would destroy the random character of the 
sampling. 

Human Bias. 

18.21. Experience has, in fact, shown that the human being is an 
extreme!) poor instrument for the conduct of a random selection. Wher¬ 
ever there is any scope for personal choice or judgment on the part of the 
observer, bias is almost certain to creep in. Nor is this a quality which 
can be removed by conscious effort or training. Nearly every human being 
lias, as part of his psychological make-up, a tendency away from true 
randomness in his choices. 

We may illustrate the unreliability of free choice on the part of even a 
trained observer by taking an example of height measurements in samples 
of wheat plants. In the course of certain work at the Rothamsted 
Experimental Station, sets of eight w heat plants were selected for measure¬ 
ment. Six of these shoots were chosen by purely random methods. The 
other two were chosen “ at random ” bv eye. If, in any set, the eight 
shoots were ranged in order of magnitude, the tw r o chosen by eye could 
have any jrlaces from one to eight ; and if they, in common with the other 
six, were really random, they should have occupied these places with equal 
frequency in a reasonably large number of sets. Table 18.1 shows the 
resulting frequencies in the ranks one to eight for 110 sets taken on 
81st May (before the ears of wheat had formed) and 112 sets taken on 
28th June (after the cars had formed). 

Fig. 18.1 shows the same results graphically, the dotted line giving 
the frequencies to be expected if the choice was really random. 

The divergence of the actual from the expected results is very striking, 
and clearly cannot be attributed to fluctuations of sampling. It will be 
seen that on 31st May, before the ears had formed, the observer was 

22 



338 


THEORY OF STATISTICS. 


Table 18.1. —Height Measurements oj Wheat, Frequencies of Plants Chosen by Eye 
in Ranks 1-8. (F. Yates, “Some Examples of Biased Sampling/’ Annals of 

Eugenics , vol. 6, 1935, p. 202.) 




Ascending Order of Magnitude Hank. 


Expectation 

Date. 

Observation. 



Tot al. 

in 



__ 

— 


Each Class. 



1 

2 j 3 j 4 j 5 | « 1 7 I 8 



May 31 

Shoot height 

9 

7 | 11 | 8 | 11 | 18 i 21 | 31 

116 

14*5 

June 28 

Ear height 

7'i 

j 

19 27 I 23 If> 10 { f> 4 

.J II 1 _ L _ 

1 

112 l 

! 

14 

i_ _ 



(a) Distribution of Shoot Heights (31 si May) in Ranks 1-8. 



{b) Distribution of Ear Heights (28th June) in Ranks 1-8. 

Fig. 18.1. Distribution of Wheat Hants according to Height. (Tabic 18.1.) 





PRELIMINARY NOTIONS ON SAMPLING. 389 

strongly biased towards the taller shoots ; whereas in June, after the 
ears had formed, he was biased strongly towards a central position and 
avoided short and tall plants. 

18.22. Sight is not the only sense which may bias a sampling method. 
In certain experiments counters of the same shape but of different colours 
were put into a bag and chosen one at a time, the counter chosen being 
put back and the bag thoroughly shaken before the next trial. On the 
face of it this appears to be a purely random method of drawing the 
counters. Nevertheless, there emerged a persistent bias against counters 
of one particular colour. After careful investigat ion the only; explanation 
seemed to be that these particular counters were slightly more greasy 
than the others, owing to peculiarities of the pigment, and hence slipped 
through the sampler’s fingers. 

The student may perform similar experiments for himself. One of 
the simplest is to ask a friend to recite “ at random ” one hundred digits, 
including zero, and then count the number of odd ones. If the numbers 
arc really random, the number of even ones and odd ones should be about 
equal, but there will frequently be found a bias one way or the other. 

18.23. Enough has been said to show that if we arc to evolve a 
satisfactory method of random sampling we must eliminate all personal 
choice. The method of selection must, therefore, follow some code of 
procedure which leaves nothing to the observer’s idiosyncrasies. 

It may sound a little paradoxical to obtain true randomness by follow¬ 
ing rules of procedure. We are reminded of Bertrand's question : “ How 
can \\c talk of the laws of chance, which is the negation of all law?” 
The ensuing sections will, it is hoped, remove any doubts on this head. 

Technique of Random Sampling. 

18.24. The methods adopted in any gi\en ease to ensure as far as 
possible that the sampling is random depend to some extent on the size 
and nature of the universe. Certain modes of procedure which are con¬ 
venient for small universes are not so for large universes. We shall also 
see that sampling from a hypothetical universe has a special significance 
and special diflieulties of its own. 

18.25. The criterion that every individual should have an equal 
chance of being chosen may be put in a somewhat different form. If the 
method of selection is independent of tlie properties of the sampled universe 
which it is desired to investigate, there will, so far as those properties are 
concerned, be no reason why one individual should be chosen rather than 
another. lienee all values of the properties which occur in the universe 
will have an equal chance of being chosen. If, therefore, we can produce 
a mode of procedure which bears no relation to the properties of the 
parent universe which we arc discussing, we may expect that it will give 
a random sample, so fai as those properties are concerned. 

18.26. We may now consider a few examples of the kind of procedure 
to which this rule leads. 

Suppose we wish to take a sample of the inhabitants of a street. 
They are already arranged in houses, and for the sake of simplicity we 
will take our problem to be that of selecting a number of houses, whose 
occupants will comprise our sample. 

Let us take as our rule of procedure the selection of every tenth house, 



340 


THEORY OF STATISTICS. 


starting at some arbitrary point. Unless there are peculiar circumstances, 
it is presumable that the properties we are investigating, which may, 
for instance, be income or size of family, are not grouped periodically 
along the street. The method of selection is then independent of the 
properties of the universe and the sampling will be random. 

If, however, the street were divided into blocks by cross-streets at 
every tenth house, so that every house in our sample was a corner house, 
and therefore, possibly, a shop, it is easy to see that the sample is no longer 
random. Shops occur, in fact, along that street with period ten, and 
since our method of selection has also that period, the method and the 
qualities under investigation are no longer independent. 

18 . 27 . We might then fall hack on a different method. If we take 
a pack of plain cards, as similar as we can get them, we can make one card 
correspond to one of the houses by writing on it the number of the house 
in the street. The pack would then he a kind of miniature of the universe 
for sampling purposes. We can draw a sample of houses by drawing a 
sample of cards, and if we shuffle the pack well w r c have every reason to 
hope that a random sample will result, for it is hard to imagine any way 
in which the method of shuttling and drawing could be dependent on Ihe 
properties of the universe. It is not impossible to make it so, however. 
For instance, if the ink with which we wrote the numbers on the cards was 
slightly adhesive, the larger numbers would not be so easy to draw out 
as the small ones, and we should tend to get houses at one end of the 
street. If such houses were of the poorer class, our sample for the purpose 
of investigating income Avould not lie random. 

Lottery Sampling. 

18 . 28 . The method we have just described, of construct mg a minia¬ 
ture universe which is easily handled, is one of the most reliable methods 
of drawing a random sample. It is the method usually adopted in drawing 
the winning numbers in sweepstakes and lotteries. In such eases the 
universe is the aggregate of persons owning tickets in the lottery. To 
every member of this universe there corresponds a number, the totality 
of which numbers, written oil pieces of paper, comprises the miniature 
universe. In practice, these pieces are placed m similar containers, 
usually small metal cylinders, and thrown into a large rotating drum, in 
w'hieh they are thoroughly mixed or u randomised.” 

18 . 29 . The practical dittieulties of constructing the miniature 
universe and of shuffling it are, however, severe if the parent universe is 
at all large. The method is, of course, inapplicable on theoretical grounds 
if the universe is not finite. To save the trouble of work with tickets it 
is often possible to use numerical methods. 

As a rather extreme ease, lot us consider a method of taking a random 
sample of the universe of visible stars, which is finite. We will take a 
star to be defined on the celestial sphere by latitude and longitude, and 
will ignore dittieulties arising from the existence of double stars or un¬ 
resolved objects. What we want, then, is a set of random pairs of lati¬ 
tudes and longitudes. As a crude method we might take an atlas of 
the w'orld and choose the figures set out in the index for places arranged 
alphabetically. But it is easy to see that this method is unsound ; for 
there will be more names associated with the more populous districts, 



PKELIMINAEY NOTIONS ON SAMPLING. 


341 


and hence the values given in the index will tend to cluster round certain 
points and avoid others—there will bo none in the middle of seas or at 
the poles, so that the pole star has no chance of being selected. 

Let us then take a set of statistical tables and open it haphazardly. 
We shall be confronted with a page of figures, and if we take, say, the tenth 
figure in eaeli row we shall probably get a set of digits which are random. 
Suppose the first ten digits obtained in this way were 7, 0, 4, 7, 9, 6, 8, 
2, 9, 1. We might then take our star to be defined by latitude 70° 47-9' 
and longitude 08^ 29*1'. Another page will give ns another star, and 
so on. ' 

Tippett’s Numbers. 

18.30. The difficulty in applying fhe method we have just described 
lies in ensuring lhat the numbers we obtain are really random. Many 
tables of figures, such as logarithm tables, may fail to give random digits 
because there is a relation between the figures in suceessi\e rows. To 
obviate tins difficulty certain ’Fables of Random Sampling Numbers have 
been constructed by L. 11. (\ Tippett, by whose name they arc known 
(ref. (005)). 

Tippett’s numbers consist of 11,000 digits taken from census reports 
and combined by fours to give 10,400 four-figure numbers. We give here 
the first forty sets as an illustration of their general appearance ; 


2952 

0011 

3992 

9792 

7979 

5911 

3170 

502 4 

4167 

9521 

1515 

1390 

7203 

5356 

1300 

2093 

2370 

7183 

3108 

2702 

3503 

1089 

0913 

7091 

0500 

5210 

1112 

0107 

0008 

8126 

1233 

8770 

275 1 

91 43 

1 105 

9025 

7002 

0111 

8810 

6416 


The reader may wonder how it was ensured that these digits are random. 
Thc\ were chosen haphazard, but the real guarantee of their randomness 
lies m practical tests. We may say at once that Tippett's numbers have 
been subjected to numinous investigations which make their randomness 
for main practical eases highly probable. Their use will be apparent from 
the following examples: 

Example IS.I. To take a random sample of 10 from the universe of 
8585 men of Table 0.7, page 94. 

Here we have 8585 individuals. We will number them from 1 to 8585. 
The problem of selecting ten men at random is then that of finding ten 
numbers at random between 1 and 8585. We therefore take a page of 
Tippett’s numbers and select the first ten on the page which are not 
greater than 8585. Thus, if our page were the one on which appear the 
numbers wo have quoted above, our individuals would be those correspond¬ 
ing to the numbers, reading across, 

2952, 00 41, 3992, 7979, 5911, 3170, 5624, 4167, 1545, 1396 

If we imagine the numbering to be done in order of height, starting with 
the shortest and ending with the tallest, we see that the lirst individual fails 
in the group 66 ", the second in the group 69-", and so on. The height- 
ranges in which the ten individuals fall are, in fact, in inches : 

66- , 69-, 67--, 71-, 68 , 66 , 68-, 67 , 65-, 65- 



842 


THEORY OF STATISTICS. 


Let us take their heights as being given by the centre points of these ranges, 
and find their mean. We have : 

M - t 7 ft j 3 0 (6() + 69 -f . . . -f 65) 

- 67*2 

Hence the mean is 67*6 inches, as against the true value of 67*46 inches in 
the whole universe. 

Example 18.2 .—To take a sample of 5 from the distribution of screw 
lengths of Table 6,3, page 84. 

Here we have 206 individuals. It would clearly be a waste to use only 
numbers from 0001 to 0200 for the screws and to neglect the rest, and we 
are able to bring nearly all numbers into play by the following device. 
We note that 206 goes 48 times into 10,000, with a certain remainder. In 
fact, 200 x48 ^ 1)888. We therefore attach 48 numbers to each screw. 
Taking them in order, beginning at the shortest, we let the first screw 
correspond to the numbers 0001 to 0048, the second to 0041) to 0096, the 
third to 0097 to 0144, and so on, the 206th screw corresponding to the 
numbers 9841 to 9888. Numbers above 9888 we leave out of account. 
Referring to the table, we see that there is one screw in the first category 
(5 to 0 thousandths short of an inch), four in the second (4 to 5 thousandths 
short of an inch), and so on. The numbers corresponding to screws in the 
different categories will then be 0001 00t8, 0049-0240, 0241 0768, and 
so on ; or, in tabular form, 


Difference in 


Difference in 


Length from 

Numbers 

Length from 

Numbers 

1 moli 

(Corresponding. 

J inch 

Corresponding 

(thousandths). 


(thousandths). 

6 to r> 

0001-0048 

t 1 to j 2 

_ 

5857-7488 

- 5 to 4 

0040-0240 

1 2 to i 3 

7480-8688 

4 to - 3 ! 

0241-0768 

1-3 to * 4 

! 8680-0456 

- 3 to - 2 

0760-1824 

} 4 to + 5 

0457-9840 

~ 2 to - 1 

1825-3024 

i 5 to ] 6 

0841-0888 

1 to 0 

3025-4320 



0 to 4 1 

4321-5856 





___ J 

___ 


We now take live Tippett numbers from the tables. For instance, 
wc might take the live in the first column of 18.30, i.e. 2952, 4167, 2370, 
0560, 2754. The screws corresponding to these numbers will be 1*5, 0*5, 
1*5, 8*5 and 1*5 thousandths short of the inch respectively. 

If we had obtained two numbers, say 0001 and 0002 in the first category, 
we should have been faced with the necessity for a decision on how the 
sampling was to be regarded, for there is only one screw in this category. 
If we suppose that a sampled screw is abstracted from the universe, it can 
only be drawn once ; and hence we should have had to ignore all numbers 
in the category 0001 to 0048 subsequent to that which first occurs. If, on 
the other hand, the screw is replaced, we can draw it as often as we like. 




PRELIMINARY NOTIONS ON SAMPLING. 


343 


Example 18.3.— In Example 3.5, page 40, we had the following data 
giving the association between inoculation against cholera and exemption 
from attack in 818 subjects: 



Not attacked. 

Attacked. 

Total 

Inoculated 

27<i 

(0001 3312) 

3 

(3313 3348) 

279 

Nol inoculated . 

173 

(3310 0021) 

(Mi 

(9027-0810) 

^ r>39 

Total 

740 

09 

818 

i 


Let us take a sample of 10 from this universe. 

We observe that 818 goes into 10,000 twelve times, with a certain 
remainder. In tact, 10,000 12x818 \ 18k We can therefore attach 
12 Tippett numbers to each member of l lie universe. To the 270 moculated- 
not-attaeked individuals we attach the numbers 0001 to 3312 (12 x276). 
To the 3 inoeulaled-attacked individuals we attach the numbers 3313 to 
3348 (a range of 30, equal to 3 x!2). Similarly for the remaining indi¬ 
viduals. The Tippett numbers corresponding to the individuals in the four 
compartments of the table are shown in brackets above. 

We then take ten random sampling numbers from the tables, say the 
first ten, reading across, from the numbers given on page 341. If we had 
come across a number greater than 9810 we should have ignored it. The 
first number, 2952, gives us ail individual falling in the moeulated-not- 
attaeked class ; the second, 0011, gives us a member of the not-moeulatcd- 
not-atiacked class ; and so on. The 10 numbers give the following 
results 



Not attacked, j 

Atta< kcd. 

J Total. 

Inoculated 

*■> 

0 

1 2 

Not inoculated 

ft ' 

1 


' 8 

Total . 

I i 

! 8 | 

2 

10 

1 


Example 18.4.-- Strictly speaking, Tippett’s numbers are applicable 
only to sampling from a finite universe, for we cannot attach a different 
Tippett number to each member of an infinite aggregate. But, by the 
following device, we can apply the Tippett tables to draw samples from a 
continuous (and therefore mlinitc) universe which is specified by a mathe¬ 
matical equation in such a way as to give us the proportion of the total 
frequency in given ranges of the variate. 

In fact, let us draw a sample from a normal universe with unit standard 
deviation and unit total frequency. 




344 


THEORY OF STATISTICS. 


Let us take ranges of 0-1 on each side of the central ordinate. Table 2 
of the Appendix will then give us the proportion of the frequency lying 
in these ranges. As in Example 18.1, we divide up the numbers from 
0000 to 9990 in proportion to these frequencies, and this is, in fact, a par¬ 
ticularly simple matter. All we have to do, for the positive values of the 
variate, is to take the figures in the second column (areas) and round them 
up to four figures. Kg. for the first interval 0*0 to 0*1, there will corre¬ 
spond the numbers 5000 to 5398 ; to the interval 0*1 to 0*2, the numbers 
5399 to 5793 ; to the interval 0*2 to 0*3, the numbers 5794 to 6179 ; and 
so on. For the negative values of the variate we have, similarly, for 0*0 
to -0*1, the numbers 4001 to 4999; for - 0*1 to -0*2, the numbers 4200 
to 4000; for -0*2 to - 0*3, the numbers 3820 to 4205; and so on, there 
being as many numbers in any negative range as in the corresponding 
positive range*. Occasionally doubt may arise in assigning a number to a 
given interval owing to t he difficulty of rounding up a figure ending in 5. 
In practice it is not likely to make any difference which interval we 
choose; if it threatens to do so, we can take the doubtful number to refer 
alternately to the two possible intervals. 

Having assigned numbers to the ranges, we sample from Tippett's 
tables in the ordinary way. For instance, a number 5500 will correspond 
to a member in the range 0*1 to 0*2. Jf wo wish to ascertain the mean of a 
sample, or some similar function of the variate values, we take the variate 
value of any individual to be the centre of the interval in which it falls. 
This is an approximation, but the narrowness of the mter\ als justifies Jt m 
most practical cases. 

Further examples will be found in a note by Karl Pearson prefixed 
to the tables of Tippett numbers themselves. It may be remarked that 
the tables may be used to give more tJiau 10,100 sets of random four- 
figure numbers; we may, for example, construct additional sets \)y 
reading the numbers downwards, or taking every other digit diagonally, 
and so on. 

Sampling from Infinite Universes. 

18.31. The methods we have just been discussing are appropriate 
only to those cases m which the universe is finite, so that it was possible 
to associate with each individual one or more Tippett numbers ; or to 
universes which, though infinite, can be treated by the method of Example 
18.4 owing to their complete specification according to the variate under 
discussion. The required conditions are met with in much of the material 
treated in practice, particularly in demographic arid economies work ; 
but in other work the universe may be cither infinite or so large as to be 
infinite for all practical purposes, and a different technique must therefore 
be used. 

Consider, for example, the problem of drawing a random sample from 
a sack of flour. We clearly cannot number all the particles in the sack, 
nor could we extract any given particles and examine them. We might, 
perhaps, reduce this ease to that of a finite universe by weighing out the 
flour into small, say one-ounce, packets and then sampling the packets. 
This is a kind of mixed sampling. But it is also possible to handle the 
problem by a special technique, as follows. 

First of all, we mix the Hour thoroughly. We then divide it into 



PRELIMINARY NOTIONS ON SAMPLING. 


345 


two halves and select one half. (It does not matter which, but for con¬ 
venience we mav^ imagine two heaps, one on the right and one on the left, 
and select left and right alternately.) We then divide the half we have 
chosen into two further halves, and again select one. The process is 
continued until the sample has reached a manageable size. We may 
reasonably suppose that it is random, especially if the Hour is well mixed 
at each stage before being divided into two. 

A similar technique may be used for many continuous ** substances, 
such as milk, grain, cement, etc. 

Sampling from Hypothetical Universes. 

18.32. The technique for drawing random samples brings out a 
fundamental difference between existent and hypothetical universes. 
Taking a simple but typical case, let us draw a sample from the uni verst' 
of throw's of a die. 

The methods we have previously used art' quite obviously inapplicable 
here. We cannot construct a card universe, because we do not know 
the nature of the parent universe. Nor can we put all the possible throws 
m a heap, and select from it by continued subdivision. In fact, there is 
only one thing we can do, and that is to throw the die, and take our 
results as a sample. 

What reason have we to suppose that this is a uindom sample ? The 
answer lies partly in theory and partly in technique. In the iirst place, 
w r e must adapt our method of throwing so that the sampling conditions, 
so far as wo can sre, remain constant throughout the experiment. This 
is a matter ol technique, and our methods can, in fact, be tested. But 
since our unixerse does not exist for us to examine separately, the only 
knowledge about it being derived from I lie sample itself, it will be clear 
on a little relic chon how dithcult it is to say that every other possibility 
in the universe had an equal chance of occurring. We return to this 
point m 18.35 and 18.36 below. 

The Importance of Random Sampling. 

18.33. We have already remarked on the importance of being able 
to gauge the error of an estimate made from a sample. The practical 
use of the theory of random sampling lies largely in the fact that it allows 
us to measure objectively, m terms of probability, errors of estimation or 
the significance of a result obtained from a random sample. The purposive 
methods to which we refer below do not do this, or at least have not yet 
been made to do so. The present trend among statisticians is, therefore, 
on the whole, in favour of the use of random sampling methods except in 
certain special cases. 

18.34. At this point wc may bring forward two important con¬ 
siderations. 

In the first place, it must not be forgotten that random sampling may 
produce the most unrandom-lookmg results. For instance, we usually 
regard a hand of cards at bridge as a random sample from the universe 
of 52 which comprise the pack; but it is not unknown for a hand of 
13 spades to be dealt. The fact that the sample looks purposive, there¬ 
fore, proven nothing. But it does provide a basis for strong presumptions. 
How strong those presumptions may be the student may judge for himself 



346 


THEORY OF STATISTICS. 


by imagining what he would think of a eard party at which he got 18 
spades twice in succession ! 

Secondly, we can never be absolutely certain that a method of sampling 
is random. There are doubts on a priori grounds because for any given 
method there are always conceivable sources of bias, and we can never 
rule out entirely the possibility that some of these sources are present. 
The utmost we can do is to make their presence extremely unlikely by 
taking great care with the experiment. 

18.35. We can, however, apply tests to judge the randomness of a 
sampling method. If we draw a single sample from a known universe, 
the result will tell us nothing about the method adopted ; but if we take 
a large number of samples they should, if the sampling is random, be 
distributed in a certain way, and for some universes we can calculate 
mathematically what that way ought to be. If, therefore, we apply our 
sampling method to such a parent universe and find the results widely 
divergent from expectation, we have every reason to suspect our sampling 
technique. Pu conha, if the results and expectation are m accord, there 
is good ground for reliance on the sampling. 

18.36. Tests of this kind presuppose that we know the form of the 
parent universe. In sampling from a hypothetical mm rise vve do not 
know this, and arc forced to estimate it from the sample. C learly, we 
cannot use this estimate to criticise the method by which the sample was 
obtained without some closer inquiry. 

Similar problems may arise for existent universes when we do not 
know the nature of the parent universe but have to estimate some or all 
of its characteristics from the data of the sample. In such eases it is 
extremely difficult to be completely satisfied that the sampling is random. 
Frequently the best we can do is to use a method which has been found 
satisfactory for other universes and hope, in the absence of any indication 
to the contrary, that it will also be satisfactory for the present universe. 

Purposive Sampling. 

18.37. We have already pointed out the dangers of introducing bias 
if the observer gives run to his inclinations in choosing a sample, and 
have stressed the fact that in general there does not exist a method of 
assessing the degree of accuracy of an estimate made from a purposive 
sample. In spite of these handicaps, however, there are eases where 
purposive selection is a useful method. In this book we shall not con¬ 
sider it in any great detail, because the reliance placed upon it depends 
largely on the circumstances of the case, remains to a great extent a 
matter of personal opinion, and is not capable of being discussed by 
elementary methods. Nevertheless, our brief survey would be incomplete 
without some reference to it. 

18.38. L et us first of all consider the ease of an observer who wishes 
to take a sample of two or three turnips from a cart-load. A random 
sample might give us several very large or very small turnips, though it 
is unlikely to do so. But if vve allow the observer to run his eye over the 
whole load and then choose, he is most likely to take what he regards as 
average turnips— i.e. average in size, weight, shape, and whatever other 
quality may be in his mind. 

It may be claimed, with some plausibility, that this purposive method 



PRELIMINARY NOTIONS ON SAMPLING. 


347 


is more likely to give us a sample which is typical or representative of the 
universe than a random method. The random sample may vary widely 
from the average, whereas the purposive sample does not. This gives 
the latter an advantage as a rule ; but it may be pointed out-— 

(a) That as the sample becomes larger the random sample becomes 
more and more representative of the parent, whereas, owing to bias, the 
purposive sample in general does not. 

(b) That in many eases the object of the sample is to give us information 
about the whole of the universe ; the purposive sample might tell us more 
about the mean weight of the turnips, but would probably give a worse 
idea of the variance of the weights because the observer has deliberately 
chosen values near the mean. 

18.39. If we had to choose between pure random sampling and 
purposive sampling, our choice would probably be determined by balancing 
the uncertainties of the former, which are mainly due to (hidnations of 
chance, and the uncertainties of the latter, which are mainly due to bias. 
In practice, however, it is often possible to combine tin* two methods 
m stratified sampling and gam some of lI k* advantages of each while 
minimising their disadvantages. 

The essentials of this process lie in dividing the parent population into 
strata and taking a random sample from each stratum. For instance, if 
we are taking a sample of earned incomes, we might first group individuals 
into classes “ earning up to £500 per annum,” “ earning from £500 to 
£1000 per annum,” and so cm, and then choose a random sample from each 
class. Or, if we wanted a sample of farms in Great Britain, we might first 
classify them roughly as “devoted mainly to arable crops,” “devoted 
mainly to milk production,” “ devoted mainly to vegetable growing,” etc., 
and again take a random sample from each group. 

18.40. Fi nally, we may also sample a universe by first of all arranging 
its individuals in groups. This amounts to taking a different sampling 
unit. For instance, in sampling the population of Great Britain we might, 
as a matter of convenience, take streets or local government districts 
instead of individual human beings as our unit. We have already had an 
instance of this type when we suggested as one way of sampling a sack of 
flour that it might be weighed out tirst into one-ounce packets. The 
process is obviously more com enient when this grouping has been done 
for us, e.g., in census returns. 

18.41. Each branch of science and industry presents its own sampling 
problems, and it would be diilieult to expand the foregoing discussion so as 
to include the detailed requirements of the worker in every sphere. We 
will conclude this chapter with an example of the way in which all the 
methods we have described may be pressed into service in order to give a 
sample which is as representative as practical limitations will allow. 

It is the practice in England for manufacturers of sugar from sugar beet 
to pay the growers according to the sugar content of their product. The 
beet, which is not unlike a parsnip, is delivered to the factory in lots of at 
least several tons with a certain amount of waste material, such as earth, 
adhering to it. The problem is, then, (<7) to find the net weight of the beet 
when cleaned and ready for the slicing process, which is the first stage in 



THEORY OF STATISTICS. 


348 

the extraction of the sugar, and (b) to ascertain the sugar content. The 
method of procedure is as follows:— 

The gross weight of the load of beet usually is first obtained by weighing 
the lorry which contains it when full, and when empty. From the middle 
of the load of beet is then abstracted about 28 pounds, which is carefully 
weighed, and then cleaned and weighed again. The difference in the 
weights gives the “ tare,” that is to say, the proportion of waste matter, 
and a proportional amount is deducted from tlie whole load to give the 
net weight of beet. This process is equivalent to taking a random sample 
and assuming that the value of the tare ” in the sample is the value in the 
whole universe. 

The sample of washed beet is then laid out on a table and arranged with 
the roots in order of size. From this sample a smaller sample is taken by 
choosing a beet every so often. This is a process of pure purposive 
selection. 

The reduced sample is still inconveniently large, so it is reduced by 
taking a slice from each beet. It is known that the sugar in the root is not 
distributed homogeneously (although it is roughly s) mmetrieal about the 
axis of the root), so trained men are employed to slice one section with a rasp, 
the section being that which would he obtained by cutting the root from 
the thick end to the tapered end into two symmetrical halves and then 
repeating the process one or more times. This selection again is pur¬ 
posive in so far as the shape of the section is based on knowledge of the 
distribution of the sugar, but random in so far as it is a matter of chance 
what is the longitude of the particular slice chosen. 

When each beet has been treated m this way there is given a heap of 
pulp which may be analysed. The heap is, however, as a rule still too 
large. It is therefore well mixed and divided into four heaps. Two heaps 
are thrown away, one is reduced to 20 grammes and analysed by the factory 
and one, similarly reduced, is analysed by the grower’s representative. 
This last method of selection is a random method adapted for a universe 
which cannot readily be enumerated. 

The linal sample therefore appears as the result of four successive 
sampling methods, two of which are random, one purposive, and one a 
mixture of purposive and random. 

SUMMARY. 

1. Sampling may be random, purposive or mixed. 

2. Random sampling owes its importance to the fact that we can assess 
the results obtained from it in terms of probability. 

3. The presence 4 of an element of choice on the part of the observer 
introduces the danger of bias, and should not be permitted where it can be 
avoided. 

4. Random samples may conveniently be drawn by the use of card 
universes or of Tippett’s numbers. 

5. The sampling technique adopted in any given ease will depend largely 
on the circumstances of that ease and the resources of the observer. At 
the present time the reliability of estimates made from samples is partly a 
matter of individual opinion founded on intuitive ideas, unless the sampling 
methods are random. 



PRELIMINARY NOTIONS ON SAMPLING, 


349 


EXERCISES. 

18.1. Draw a random sample of 20 from the universe of men of the last 
column of Exercise 0.0 (inhabitants of the United Kingdom classified according 
to weight). Find the mean of the sample and compare it with the mean of 
the universe. 

18.2. Deal yourself a hand of 18 cards from an ordinary pack of 52 playing 
cards and count the number of court cards. Use your result to estimate the 
number of court cards in the whole pack. 

Repeat the experiment ten times, taking a new deal each time, and compare 
the mean of your results witli the true value, 12. 

18.8. Suggest a method for obtaining a random sample of words from the 
English language by the use of Tippett's numbers and a dictionary. 

18.4. Draw a sample of 80 from the universe of the last column of Table 0.7, 
and find the standard deviation. Compare your result with the standard 
deviation of the universe. 

18.5. Suggest a possible source of bias in the following:— 

(a) A barrel of apples is sampled by taking a handful from the top. 

(b) A mixture of sand and sawdust is sampled !>v scooping up a 

quantity from the bottom. 

(r) A set of digits is taken by opening a Telephone Directory at 
random and choosing tlu* telephone numbers in the order in 
wlueh thc\ appear on the page. 

(r/) Re aders of a newspaper are sampled b\ printing in it an im Ration 
1o them to send up their observations on some topical event. 

(c) 1m estimators into the size of families in a tow T n conduct a house- 

to-house inquiry (1) in the morning, (2) in the afternoon, 
ignoring those houses at which there is no reply. 

18.(i. Draw 100 samples of 10 from a normal universe by means of Tippett's 
numbers, and form the frcqucne>-distribution of their means. 

18.7. In the data obtained in Exercise 18.0, form the frequency -distribution 
of the root-mean-square deviations of the samples about the mean of the parent 
universe. 

18.8. Draw' 100 samples of 10 from the Poisson universe of 10.47, page 101, 
and form the frequency-distribution of their means. 

18.0. Draw 500 samples of I from the universe of Australian marriages of 
Table 0.8, page 00, and form the frequenev -distribution of their range. 

18.10. Draw a sample of 50 from the universe of Table 11.4, page 200 (4012 
dairy cows), and find the eon elation in the sample between age in \cars and yield 
of milk per week. Compute )our result with the correlation in the universe 



CHAPTER 19. 


THE SAMPLING OF ATTRIBUTES LARGE SAMPLES. 
The Problem. 

19.1. In dealing with the theory of sampling we shall iind it con¬ 
venient to preserve the formal distinction between attributes and \ariables 
which wc drew earlier in this book. The theory of the sampling of 
attributes is in many respects simpler than that of variables, and in this 
chapter we shall confine ourselves to it. We shall begin by considering 
a type of sampling w hich we shall call simple, involving certain limitations 
on the generality of the problem, and shall then proeeed to examine the 
removal of these limitations m order to deal with the general ease. 

19.2. The sampling of attributes may be regarded as the drawing 
of samples from a universe containing A\ and not-d’s. The number of 
*Ps in each sample, or the proportion of J\s, will form part of the data 
provided by the samples. 

We shall find it convenient to adopt the nomenclature of 10.3 and to 
speak of the drawing of an individual on sampling as an “event." The 
appearance of the attribute A may be ealled a “success” and the non- 
appearance a “ failure.” Thus, in sampling a human population for the 
proportions of the two sexes, we might say of a sample of 100, 45 of which 
were male, that the sample consisted of 100 events, 45 of which were 
successes and 55 failures. (It might, of course, be more convenient — 
and would certainly be more courteous to reverse the names and call 
the occurrence of a female a 6t success ” and of a male a u failure.”) 

Simple Sampling. 

19.3. By simple sampling we mean random sampling in which each 
event has the same chance p of success, and in which the chances of 
success of different events are independent, whether previous trials have 
been made or not. These conditions hold good, for instance, in the 
throwing of a die or 1 lie tossing of a coin ,* the chance of getting heads 
with a coin is not affected by what was obtained tin the previous trials, 
and remains constant no matter how many trials are made, provided, of 
course, that the coin does not begin to wear or is not falsely manipulated 
by the experimenter. 

Simple sampling is a particular form of random sampling, as we have 
defined it in the previous chapter. Suppose, for example, we take a 
sample of two from a universe consisting of 6 men and 4 women under 
random sampling conditions, i.e. so that at each of the two events which 
constitute the sample every member of the universe has an equal chance 
of being chosen. If, at the first trial, we draw a man, tlie chance of doing 
so being there will be 5 men and 4 women left in the universe, and 
the chance of obtaining a man on the second trial will be This is not 
the same as the chance on I he first trial, and hence the sampling is not 
simple, though it is random. 


350 



SAMPLING OF ATTRIBUTES—LARGE SAMPLES. 


351 


Mean and Standard Deviation in Simple Sampling of Attributes. 

19.4. Suppose now that we take N samples with n events in each. 
The chance of success of each event is p and of its failure q~l -p. As 
in 10 . 6 , the frequencies of samples with 0, 1 , 2, . . . successes are the 
terms in the series N(q +-p) n , i.e. 

A| q n +nq u l p + n ^ n ^ ! . . . +nqp n Mjp n j 

As in 10.9, this distribution has mean M given by 

M - np 

and standard deviation (10.10) 

a^-Vnpq . , . (19.1) 

19.5. In lieu of recording the number of successes m each sample 

we might have recorded the proportion of successes, that is, \h of the 

number in each sample. As this would amount to dividing all figures 
of the record by n, the mean proportion of successes must be p, and the 
standard deviation of the proportion of successes is given by 

.<>»•*> 

Equations (10.1) and (19.2) are of fundamental importance. 

Example 19.1,— The following results, due to Weldon, are of interest. 
Weldon thiew 12 dice 1090 times, a throw of 1, 5 or 6 being called a 
success. W e have, then, 4090 samples of 12 from the universe consisting 
of all possible throws of the dice. 

If the dice arc all true, the chance of success is Hence, the 
theoretical mean 3/—6; theoretical value of the standard deviation 
cr - \A)-5 x 0-5 x 12 - 

The following was the frequency-distribution observed :— 


Successes. 

Frequency. 

Successes. 

Fi equency. 

0 

— 

7 

847 

1 

7 

8 

536 

o 

60 

9 

257 

3 

198 

10 

71 

1 

480 

11 

11 

5 

731 

12 

- 

6 

948 

Total 

4096 


Mean 0-109, standard deviation a = 1*712. The proportion of 
successes is 6*139/12 “0*512 instead of 0*5. 

Example 19.2. —(G. U. Yule.) The following may be taken as an illustra¬ 
tion based on a smaller number of observations : Three dice were thrown 
648 times, and the numbers of 5’s or 6’s noted at each throw. —1/3, 
q — 2/3; theoretical mean 1; standard deviation 0*816 



352 


THEORY OF STATISTICS. 


Frequency-distribution observed: 

Frequency. 

179 
298 
141 
30 

Total 018 

M -T084, a — 0-823. Actual proportion of successes 0-345. 

19.6. The value pn is sometimes called the u expected ” \alue of 
theflrumber of successes in the sample. It is not only the mean \ aluc of 
All samples, but is the most probable value and is also representative, i.e . 
it bears the same ratio p to the number m the sample as (he number 
of individuals with attribute A in the universe bears to the total number 
in the universe. The divergences of the number of successes from the 
expend value in any given random sample give rise to what we have 

W hitherto called fluctuations of random sampling. They arc* to be regarded 
as deviations due to the nature of the sampling process, and not indicative 
of any real properties of the universe itself. 

19.7. Equations (19.1) and (19.2) enable us to deal with lhe question 
which has arisen several times in earlier chapters of this book, namely, 
when can we say that observed deviations from the expected values in 
a sample of attributes are due to some real effect and are not merelv 
attributable to sampling fluctuations ? 

The binomial distribution, to which samples classified according to 
the frequencies of an attribute give rise, is a single-humped type which 
approximates very closely to the normal for large mJuos of w, the number 
in the sample. It follows that the great majority of its members lie 
within a range ± 3a on each side of the mean, i.e. of f 3\ / npq on each 
side of the value np. If the distribution is exactly normal, 0-9973 of the 
curve lies within this range (10.29). We can therefore say that if a 
particular sample gives a value of p outside this range, the deviation from 
the expected value is most unlikely to have arisen from fluctuations of 
simple sampling. If n is large, the chances are about 3 in a thousand 
that it arose in that way. 

It must be emphasised that the lree use of the 3 a rule is justilied only 
if n i$ large. 

^Example 19.3 .—In the experiments of Example 19.1,25,145 throws of 
a 4, 5 or 6 were made out of 49,152 throws altogether. The chance of 
« throwing one of these numbers is J, and hence the expected value is 24,570. 
The observed number was thus 509 in exces., of this. Can the deviation 
from the expected value be due to fluctuations of simple sampling ? 

The standard deviation of simple sampling is 
a - Vnpq = V l x £ x 49152 


Successes. 

0 

1 

2 

3 


-110-9 



SAMPLING OF ATTRIBUTES—LARGE SAMPLES. 


353 


The deviation observed is 5*16 times this quantity, and it is therefore 
most improbable that it arose as a sampling fluctuation. We must there¬ 
fore seek some other explanation of the deviation, and it seems reasonable 
to suspect that the dice were slightly biased. 

The problem might, of course, have been attacked equally well from 
the standpoint of proportion instead of the actual numbers of successes. 
This proportion is 0-511G instead of the expected 0*5000, the difference in 
excess being 0-0116. The standard deviation of the proportion is 


n i 

5 V .> x x 


i 

19152 


0*00220 


and the difference obsened is 5-16 times this, winch is the sam 
before, as of course it must be. 


"ratio as 


'^Example VJ.4.- (Data from the Second Report of the Evolution Com¬ 
mittee oj the Royal Society, 1905, p. 72.) 

Certain crosses of the pea, Eisum sativum , gave 5621 yellow and 1801 
green seeds. The ( xpeetation is 25 per cent, of green seeds on a Mendclian 
hypothesis, (an the dncrgcnces from the expected values have arisen 
from fluctuations of simple sampling only ? 

The ntuneiical difh renee from the expected result is 26. The standard 
deviation of simple sampling is 


a vV‘25 x 0-75 x 7125 :{<;•(» 


The divergence from theory is only about 0-0 of this, and hence may 
very veil have arisen trom fluctuations of simple sampling. # 

Standard Error. 

19.8. We shall \ery frequently hafl^ to use the standard deviation of 
sampling, and it is comement to have a shorter name lor tins quantity. 
We shall call it tlu standard error. The use of the word error is justiiied 
m this connection by the fact that we usually regard the expected value 
as the true Miluo, and divergences from it as errors ot estimation due to 
sampling elleets ; but the student should not attach too much significance 
to the particular term “error.” 

In most of our work the term “ standard error ” will be applied to the 
standard dexiation ol simple sampling ; but it has a rather wader meaning, 
embracing this one, which w r e shall discuss in consideiing the sampling of 
variables (20.22, ej. also 19.31 j. 

We may, then, summarise the foregoing m tin* statement that fre¬ 
quencies differing from the expected frequency by more than 6 times the 
standard error are almost certainly not due to fluctuations of sampling. 
They point to some departure of the sampling from simplicity, which may 
in turn point either to some flaw in the sampling technique or to causal 
effects in the universe itself. 


Probable Error. 

19 . 9 . Instead of the standard error, some authorities have used a 
quantity called the p)obahle error , which is 0-67119 times the standard 
error. This practice arose from the fact that in the normal curve the 

23 



354 


THEORY OF STATISTICS. 


quartiles are distant 0*07419a from the mean, so that the probability that 
a deviation is in excess of the probable error is and is equal to the 
probability of a deviation being less than tin* probable error. The rule 
that the observed deviation should not be greater than 3 times the standard 
error is then approximately equivalent to a rule that it should not exceed 
4*5 times the probable error. 

The use of the probable error is declining, and we recommend the student 
to eschew it. 

19.10. In Examples 19.1 to 19.4 we dealt with eases where p, the 
probability of success, was known a priori. In many cases it is not know n, 
and further consideration is necessary before we can apply equations (19.1) 
and (19.2) to such cases. 

To fix the ideas, let us suppose that we have a simple sample of 1000 
individuals from the inhabitants of Great Britain, and find that 36 per cent, 
of them have blue eyes and the remainder have eyes of some other colour. 
What can we infer about the proportion of blue-eyed individuals in the 
whole population ? 

In this instance we do not know the proportion p of blue-eyed in¬ 
dividuals in the population. We do know that the standard error is 
Vioodpq, Now, whatever p and q are, pq cannot exceed and lienee the 
standard error cannot exceed ^VlOOO, or 10. Hence, whatever p is, a 
simple sample should give a number of successes within 3 times this, or 48, 
of the expected frequency pm This is 4*8 per cent, of the sample, and we 
thus may say that the proportion of blue-eyed people in the whole popula¬ 
tion is 30 :L 4*8 per cent., i.e, that it Jies between 31*2 and 40*8 per cent. 

19.11. • Wc may, however, make a rather better estimate. We have 
seen that the standard error is small compared with the expected value, 
and hence with the observed value. If, therefore, in calculating the 
standard error we take* the observed values of p and q in the sample instead 
of the unknown true \allies of p and q , we shall not involve ourselves in 
very great error. 

Tlius, taking p to be 0*36, q= 0*64, 

1 7 Vnpq V0*36 x 0*64 x 1000 
- 15-18 

Hence, 3a 5 approximately, and the limiis are now 36 + 4*6 or 31 *4 
and 40*6 slightly narrower than those previously obtained. 

19.12. In this example we have taken the proportion of successes in 
the sample to be an estimate of the proportion of successes in the universe, 
and have set limits to the range within which the true proportion probably 
lies. There are other reasons, of an advanced theoretical character which 
wc shall not specify, for taking p in the sample as an estimate of p in the 
universe, but the student will probably concede that it is the most reason¬ 
able thing to do in the circumstances. We must, however, look a little 
more closely into the assumption that this estimate may be used in calculat¬ 
ing the standard error. 

19.13. The assumption is a justifiable one if n is large and neither p 
nor q is small. For in such a ease, the standard error of the proportion p 

is \/^> and this is small compared with p unless p itself is small. 



SAMPLING OF ATTRIBUTES—LARGE SAMPLES. 


355 


4- 


If,then, the standard error of p is small, the value of p estimated from 
the sample must be elose to the real value, and we shall not introduce any 
serious error by taking the estimated value in evaluating the formula 

V/. 

n 

19.14. Pr ecisely how large n must be for this approximation to be 
valid it is not easy to say. Samples of 1000 are almost certainly large 
enough, and we may often apply the foregoing procedure with considerable 
confidence to much smaller samples, say of 100. For samples below that 
figure it is as well to examine carefully the circumstances of any given ease 
and to proceed with caution. 

Wc shall have more to say on this matter when we consider the sampling 
of variables (20.17 and 20.18). 

For the remainder of this chapter we shall assume that our samples 
are “ large,” that is to say, that the approximations involved in onr 
assumptions as to the estimate of p are valid. 

^iiara nipic 7.9. 5. A sample of 000 days is taken from meteorological 
records of a certain district, and 100 of 1 hem art* found to bo foggy. What 
are the probable limits to the percentage of foggy days in the district ? 

Anticipating somewhat our discussion of simple sampling, we will 
assume that the conditions of this problem give a simple sample. 

lienee, 

7'='n q " 

Standard error oi the proportion of foggy days 


V 7 


IM _ 

n 

00105 


V 


1 8 

x x 

1 ) 1 ) 


1 

000 


- 1*05 per cent. 

Hence, taking J to he the estimate of the number of foggy days, ve ha\e 
that the limits are 11*11 per cent. 4 3*15 per cent., i.c. 8 per cent, and 
14*^5 per cent, approximately. 

‘ w Example 19.0 .—A biased penny is tossed 100 times and comes down 
heads 70 times. What are the probable limits to the probability of getting 
a head in a single trial ? 

We require to know the limits of p. If we assume that 100 is a large 
sample, wc have: 

1/1(1 J 1 , 7 ■ 8 0 0458 

H V 100 10 10 


V 


The limits are therefore 0*70 + (3 x 0*0458) 

-0*70-1 0*1371 

- 0*50 and 0*84 approximately 

If wc feel any doubt as to the validity of using estimates of p and q 
from a sample of 100 m calculating the standard error, we may proceed 
as follows :— 



356 


THKORY OF STATISTICS. 


The standard error of p cannot exceed V \ J 0 x l x l, i.e. 0-05. lienee 
the value of p lies almost certainly within the limits 0-70 ± 0*15, i.e. 0*55 
and 0*85. 


If p - O'55, 

JW =-0-04975 
y n 

If p -0-85, 

J Pq - 0 03571 
y n 


For intermediate values of n. \l^ lies between these limits, lienee the 

/ \ n 

maximum value of the standard error is 0*01975, and p lies between the 
limits 0*70 ± 0*11925, i.e. 

0*55075 and 0*84925 

It will be seen that these limits are nearly equal to those obtained on 
the assumption that p-=q -and are not very different lrom those we 
got by assuming p- 0*70. There would, however, be an appreciable 
difference if p had been small, say 0*10. 

19.15. If one of the two proportions p and q becomes very small, 
equation (19.1) may be put into an approximate form that is very useful. 
Suppose p to be the proportion that becomes very small, so that wc may 
neglect j) 2 compared with p ; then 

pq - p p 2 p approximately 
and consequently we have approximately: 

a - Vnp - VM .... (19.3) 

That is to say, if the proportion of successes he small , the standard 
deviation of the number of successes h the square loot of the mean number 
of successes, lienee we can find the standard error even 1 hough p be 
unknown, provided only wc know that it is small. 

This is, in fact, the case when the binomial becomes the Poisson series 
(10.40). ¥ or such distributions the rule that a range of On includes the 

great majority of the observations remains valid, as may be seen from 
the diagram on page 190, but the limits assigned to the standard error of 
the mean J\1 may be too wide on the left of the mean. For example, if 
A/-1, <7-1, and a range of 3 units to the left of the menu carries us to a 
value of -2, whereas there can be no part of the frequency with negative 
values of the variate. 

19.16. It will be noticed that the standard error depends only on the 
value of p and the size of the sample, and that therefore the range within 
which p probably lies is independent of the size of the universe. This 
appears a little paradoxical, because one might expect that a sample 
which was, say, 20 per cent, of the universe would enable closer limits 
to be set than one which was 10 per cent, of the universe. 

The explanation is to be found in the nature of simple sampling itself. 
We shall see below that the conditions under which simple sampling arises 
in practice are such that either the universe is actually or practically 
infinite, or each member drawn for a sample is put back in the universe 



RAMPLING OF ATTMBITTER—LARGE SAMPLER. 


357 


before the next is drawn. In either ease the universe is inexhaustible, 
and no sample is any nearer to including all its members than another 
sample. It is, therefore, not surprising to find that the size of the universe 
docs not appear in the formula for the standard error. 

19.17. A further notable fact is that the standard error of p varies 
inversely as the square root of n 9 and not inversely as n itself. Thus, as 
n becomes larger the standard error becomes smaller, which is what we 
should expect, but the standard error decreases proportionately to the 
square root of n. For instance, if a sample of 100 gives us a standard 
error of 10 per cent., il will take a sample of 400 to halve that error, and 
a sample 100 times as large, /.<*. 10,000, to reduce 4 the error to one-tenth 
or one per cent. 

Precision. 

19.18. The standard error may fairly be taken to measure the un¬ 
reliability of an estimate of p; the greater the standard error,the greater 
the iluetuations of the observed proportion, although the true proportion 
is the same throughout. The reciprocal of the standard error (1 /.?), on 
the oilier hand, or some convenient multiple of the reciprocal — cf. 
8.15 and 10.32 may be regarded as a measure of leliabiUti /, or, as it is 
sometimes termed, pieeision , and consequently the reliability or precision 
of an observed proportion units as the square root of the number of observa¬ 
tions on which it is based. 

The Limitations of Simple Sampling. 

19.19. In order to realise the limitations on the use of the formula? of 
equations (10.1) and (10,2), it is necessary to consider what are the con¬ 
ditions which will give rise to simple sampling in practice. Supposing, for 
example, that we observe among groups of 1000 persons, at different times 
or in different localities, the various percentages of individuals possessing 
certain characteristics dark hair, or blindness, or insanity, and so forth. 
Under what conditions should we expect the observed percentages to 
obey the law of sampling that we have found, and show a standard 
deviation given by equation (10.2) ? 

19.20. In the first place, the condition that p , the probability of 
drawing an individual with attribute A on random sampling, remains 
constant, and in particular is the same for all samples, means that the 
proportion of individuals with attribute A m the universe must remain 
constant at the drawing of each sample. Consequently, if formula (10.2) 
is to hold good in our practical case of sampling there must not be a 
difference in any essential respect i.i. in any character that can affect 
the proportion observed - between the localities from which the samples 
are drawn, nor, if the samples have been made at different epochs, must 
any essential change have taken place during the period over which the 
observations are spread. Where the causation of the character observed 
is more or less unknown, it may, of course, be difficult or impossible to 
say what differences or changes are to be regarded as essential, but where 
we have more knowledge the condition laid down enables us to exclude 
certain eases at once from the possible applications of formula (19.1) or 
(19.2). Thus it is obvious that the theory of simple sampling cannot 
apply to the variations of the death-rate in localities with populations 



958 


THEORY OF STATISTICS. 


of different age and sex composition, or to death-rates in a mixture of 
healthy and unhealthy districts, or to death-rates in successive years 
(luring a period of continuously improving sanitation. In all such cases 
variations due to definite causes are superposed on the fluctuations of 
sampling. 

19.21. Secondly, the proportion of individuals with attribute A must 
remain constant for the drawing of each individual member of the sample. 
This is again a very marked limitation. To revert to the ease of death- 
rates, fornmhr (19.1) and (19.2) would not apply to the numbers of persons 
dying in a series of samples of 1000 persons, even if these samples were all 
of the same age and sex composition, and living under the same sanitary 
conditions, unless, further, each sample only contained persons of one sex 
and one age. For if each sample included persons of both sexes and differ¬ 
ent ages, the condition would be broken, the chance of death during a given 
period not being the same for the two sexes, nor for the young and the old. 
The groups would not be homogeneous in the sense required by the con¬ 
ditions from which our formula 4 have been deduced. 

19.22. We pointed out in 19.3 that sampling from a finite universe 
is not simple owing to the fact that the abstraction of an individual alters 
the chance of success at the next trial. In practice there arc three 
important cases in which the condition for the constancy of p is satisfied : 

(a) If the indhiduals are replaced at each drawing before the next 
drawing is made ; for in this case the constitution of the universe is the 
same at each trial, and hence the chance of success must also be the same. 

(b) If the universe is infinite; for in this ease the withdrawal of a 
finite number of members does not affect tlie proportion of individuals in 
the universe possessing the attribute in question. 

(e) If the universe is \ery large 4 , p may be taken to be constant with¬ 
out sensible error, provided that the sample is not also large. This is a 
very important ease, and justifies the application of the theory of simple 
sampling to many practical data. 

Suppose, for instance, we are sampling the population of the United 
Kingdom for sex ratio, and decide to take a sample of 1000. Suppose 
again, for the purposes of illustration, that the whole population consists 
of 23 million women and 22 million men. The chance of getting a man at 


, . . 22 , 000,000 „ . . 

the first trial will then be ^ II we succeed in getting a man, 

45,000,000 

21 999 999 

the chance of doing so at the second trial will be , / ’ *’ 'J. Even if we 

44,999,999 

draw 999 men the chance of success at the thousandth trial would be 
All these chances, to a close approximation, arc equal, and we 

41,999,001 

can assume them to be so without fear of appreciable error. The ease 
would, of course, have stood differently if our sample had numbered several 
millions. 


19.23. A third condition for simple sampling was explicitly stated in 
our definition in 19.3. The individual events must be completely in¬ 
dependent of one another, like the throws of a die, or sensibly so, like the 
drawing of balls from a bag containing a number of balls which is large 



SAMPLING OF ATTRIBUTES—LARGE SAMPLES. 859 

compared with the number drawn. Reverting to the illustration of a 
death-rate, our formulae would not apply even if the sample populations 
were composed of persons of one age and one sex, if we were dealing, for 
example, with deaths from an infectious or contagious disease. For if orffe 
person in a certain sample has contracted the disease in question, he has 
increased the possibility of others doing so, and hence of’ dying from the 
disease. The same thing holds good for certain classes of deaths from 
accident, e.g. railway accidents due to derailment, and explosions in mines : 
if such an accident is fatal to one person it is probably fatal to others also, 
and consequently the annual returns show large and more or less erratic 
variations. 

19.24. Jt is evident that these conditions very much limit the field of 
practical cases of an economic or sociological character to which formula* 
(19.1) and (19.2) can apply without considerable modification. The 
formula 1 appear, however, to hold to a high degree of approximation in 
certain biological cases, notably in the proportions of offspring of different 
types obtained on crossing hybrids, and, with some limitations, to the 
proportions of the two sexes at birth. It is possible, accordingly, that in 
these cases all the necessary conditions arc fulfilled, but this is not a 
necessary inference from tlie mere applicability of the formula’. In the 
ease of the sex ratio at birth it seems doubtful whether the ndc applies to 
the frequency of the sexes in individual families of given numbers, but it 
does apply fairlv closely to the sex ratios of births in different localities, 
and still more closely to the ratios in one locality during successive periods. 
That is to say, if we note the number of males in a series of groups of 
n births each, the standard deviation of that number is approximately 
V)i})q, where p is the chance of a male birt h ; or, otherwise, Vpqjn is the 
standard deviation of the proportion of male births. 

Applications of Simple Sampling. 

19.25. We have already shown in examples how the theory of simple 
sampling can be used to gauge the precision of an estimate of the proportion 
of individuals in a universe which possess an attribute A, and to set limits 
outside which that proportion probably does not lie. We now turn to 
further applications of the theory in the cheeking and control of the 
interpretation of statistical results. 

19.26. Case f. — Given the expected frequency in a sample and the 
observed frequency of successes, it is desired to know whether the deviation 
of the second from the lirst can have arisen from fluctuations of simple 
sampling. 

This is a ease which we have discussed in Examples 19.3 and 19.4. 
From the expected frequency we can calculate the standard error, and if 
the deviation is more than 8 times this quantity it almost certainly did not 
arise from fluctuations of random sampling. 

19.27. One caution is necessary here. If the deviation is less than 
3 times the standard error, it does not follow that the expected frequency 
divided by the number in the sample is really the proportion of individuals 
possessing the attribute A in tlie universe. In other words, if the expected 
value is derived from some hypothesis, such as the Mcndehan hypothesis in 
the case of Example 19.4, the fact that the deviation lies within the limits 
of 3 times the standard error does not prove the hypothesis correct. It 



860 


THEORY OF STATISTICS. 


only indicates that experiment and hypothesis arc not in disagreement. 
Furthermore, if the deviation lay without those limits, the hypothesis 
would not necessarily be disproved, for the fault might lie with the 
randomness of the sampling. 

19.28. Case 2 .—Two samples from distinct materials or different 
universes give proportions of A 7 s p x and p 2 , the numbers of observations in 
the samples being and n 2 respectively, (a) Can the difference between 
the two proportions have arisen merely as a fluctuation of simple sampling, 
the two universes being really similar as regards the proportion of A *s 
therein ? (b) If the difference indicated were a real one, might it vanish, 

owing to fluctuations of sampling, in other samples taken m precisely the 
same way ? This case corresponds to the testing of an association which is 
indicated by a comparison of the proportion of A\ amongst 1C s and j8\. 

(a) We ha\ e no theoretical expectation in tins case as to the ptoportinn 
of A 7 s in the universe from which either sample lias been taken. 

Let us find, however, whether the observed difference between p l and 
p 2 may not have arisen solely as a fluctuation of simple sampling, the 
proportion of A\ being really the same in both easts, and given, Jet us say, 
by the (weighted) mean propoition in our two samples together, i.e, by 


Co ~ 


IhVi 1 n Ah 
n L +fl 2 


(the best guide that we have). 

Let q, c 2 be the standard errors in the tw r o samples, then 

<) 2 =W/o/»i> •'a 2 


If the samples are simple samples in the sense of the previous work, then 
the mean difference between p x and p 2 will be zero, and the standard error 
of the difference e 12 , the samples being independent, will be given by 





(19.1) 


If the observed difference is less than some three times e 12 , it may have 
arisen as a fluctuation of simple sampling only. 

(h) If, on the other hand, the proportions of A’s are not the same in the 
material from which the two samples are drawn, but p 1 and p 2 are the true 
values of the* proportions, the standard errors of sampling in the two eases 
are 

‘i 2 - v,<hK> e 2 2 =;>//> 2 

and consequently 


2 VCh Jh r b 

€ i o “h 

n x n 2 


(19.5) 


If the difference between p x and p 2 does not exceed some three times 
this value of e 12 , it may be obliterated by an error of simple sampling on 
taking fresh samples in the same way from the same material. 

The student will note that in arriving at these results we have assumed 
that the unknown values p ()7 p ly p 2 are given to a sufficient degree of 
approximation by estimates from the samples. This, as we have seen, is 
justified if n be large. 



SAMPLING OF ATTRIBUTES—LARGE SAMPLES. 


361 


x/ Example 19,7 ,—(Data from J. Gray, “Memoir on the Pigmentation 
Survey of Scotland,” Jour . of the Royal Anthropological Institute , vol. 37, 
1907.) The following are extracted from the tables relating to hair-colour 
of girls at Edinburgh and Glasgow :— 


Edinburgh 

Glasgow 


Of Medium Total Per cent. 

Hair-colour. observed. Medium. 


4,008 9,743 4M 

17,529 89,704 44 1 


Can the difference observed in the percentage of girls of medium hair- 
colour have arisen solely through fluctuations of sampling? 

In the two towns together the percentage of girls with medium hair- 
colour is 43-5 per cent. If this were the true percentage, the standard 
error of sampling for the difference between percentages observed m 
samples of the above sizes would be: 


— 0*50 per cent. 


The actual difference is 3*0 per cent., or over 5 times this, and could not 
have arisen through the chances of simple sampling. 

If we assume that the difference is a real out* and calculate the standard 
error by equation (19.5), we arrive at the same value, viz. 0-50 per cent. 
With such large samples tlie difference could not, accordingly, be 
obliterated by the fluctuations of simple sampling alone. 

19.29. Cast 3. Two samples arc drawn from distinct material or 
different universes, as in the last ease, giving proportions of J’s p 1 and p 2 , 
but m lieu of comparing the proportion p l with p 2 it is compared w r ith 
the propoition oi uV s in the two samples together, viz. p 0 , where, as before, 


WiPi t n 2 p 2 

/'o .. . „ 

Hi t 

Required to find whether the difference between p x and p 0 can have 
arisen as a fluctuation of simple sampling, p 0 being the true proportion 
of A\ m both samples. 

This ease eorresjionds to the testing of an association which is indicated 
by a comparison of the proportion of A' s amongst the Z?\s with the pro¬ 
portion of A’s in the universe. The general treatment is similar to that 
of Case 2, but the work is complicated owing to the fact that errors in 
and p 0 are not independent. 

If e 01 be the standard error of the difference between p L and p 0 , we 
have at once : 

e ui ~ £ o 2 ^ e i 2 “ ^’oi f o € i 




+ 

-l w* n. 


1 


1 Vn l \ > 


n x + n 2 


r 01 being the correlation between errors of simple sampling in p x and p 0 . 
But from the above equation relating p 0 to p x and p 2 , writing it in terms 



362 


THEORY OE STATISTICS. 


of deviations in p 0 , p x and p 2 , multiplying by the deviation in p x and 
summing, we have, since errors in p x and p 2 are uncorrelated : 


Therefore finally: 


^+^ 2*0 > W l + n 2 

3 = MO ^2 
601 n x + n 2 n x 


(19.6) 


Unless the difference between p 0 and p x exceed, say, some three times 
this value of e 01 , ii may have arisen solely by the chances of simple 
sampling. 

It will be observed that if n x be very small compared with w 2 , <r 01 
approaches, as it should, the standard error for a sample of n x observations. 

We omit, in this ease, the allied problem whether, if the difference 
between p A and p 0 indicated by the samples were real, it might be wiped 
out in other samples of the same size by lluetnations of simple sampling 
alone. The solution is a little complex, as we no longer have 


Example 19.8 — Taking now the figures of Example 19.7, suppose 
that we had compared the proportion of girls of medium hair-colour in 
Edinburgh witli the proportion in Glasgow and Edinburgh together. 
The former is 4*1-1 per cent., the latter 13*5 per cent., difference 2*4 per cent. 
The standard error of the difference between the percentages observed in 
the sub-sample of 9743 observations and the entire sample of 49,507 
observations is, therefore, 


c oi 


= (43*5 x 50*5 



39,764 
507 x 9743 


= 0* 15 per cent. 


The actual difference is over five times this (the ratio must, of course, bo 
the same as in Example 19.7), and could not have occurred as a mere 
error of sampling. 


Effect of Removing the Limitations of Simple Sampling. 

19.30. Let us now consider the effect on the standard error of the 
removal of the conditions of simple sampling which we discussed in 

19.19 to 19.24. 

The breakdown of the condition wc discussed in 19.20, namely, that 
the proportion of A 's in the universe should remain constant for all 
samples, might occur if we took a number of samples from a changing 
universe or from different strata of a universe which was not homogeneous. 

Wc may represent such circumstances in a ease of artificial chance by 
supposing that for the first/, throws of n dice the chance of success for 
each die is p u for the next / 2 throws p 2 , for the next / 3 throws p 2t and so 
on, the chance of success varying from time to time, just as the chance 
of death, even for individuals of the same age and sex, varies from district 
to district. Suppose, now, that the records of all these throws are pooled 
together. The mean number of successes per throw of the n dice is given 

by 

+fiP‘i */>< + • • ■) np 0 



SAMPLING OF ATTRIBUTES- LARGE SAMPLES. 


363 


whore N (/) is the whole number of throws, and p 0 is the mean value 
S (fp)/N of the varying chance p. To find the standard deviation of the 
number of successes at each throw, consider that the first set of throws 
contributes to the sum of the squares of deviations an amount 

fil n P& +n 2 (p 1 -p 0 ) 2 J 

n Pi9i being the square of the standard deviation for these throws, and 
n (pi ~P o) the difference between the mean number of successes for the 
first set and the mean for all the sets together. Hence the standard 
deviation a of the whole distribution is given by the sum of all quantities 
like the above, or 

Ncj 2 - r)S( fpq) + « 2 S{/(/> ~/> 0 ) 2 } 

Lot cr p be the standard deviation of p , then the last sum is Nn 2 (j p 2 , 
and substituting 1 p for < 7 , we have: 

a 2 np 0 - np^-na 2 i n 2 cr , l 

.... (19-7) 

This is the formula corresponding to equation (19.1); if we deal with 
the standard deviation of the proportion of successes, instead of that of 
the absolute number, we have, dividing through by ?i 2 , the formula 
corresponding to equation (19.2), viz. 


n n 1 


(19.8) 


19.31. If n be large and s 0 be the standard error calculated from 
the menu proportion of successes p 0 , equation (19.8) is sensibly of the 
form 

s*=s „ 2 + °/ 

We have thus analysed ,v 2 into two parts, ,v 0 2 the portion due to 
deviations from the mean p 0 , and a p 2 tlie portion due to variations of the 
p\ about their mean. The former we may regard as the contribution to 
5‘ 2 due to chance tluetnations ; the latter as the contribution due to real 
variation of the proportions among the different strata of the universe. 

In conformity with later work we shall continue to call s (or a if we 
are dealing with frequencies) the standard error, although the sampling 
is no longer simple. The deviation s is still, in fact, the standard deviation 
of the various sample values of p about the mean value. The term 
Sq (or \' np {) q 0 ), on the other hand, is what the standard error would have 
been if the sampling had been simple, and from the above equation we 
accordingly sec that lhe effect of the breakdown of the first condition for 
simple sampling is to increase the standard error. 

The values of Vs 2 -s 0 2 are tabulated at the foot of Table 19.1, which 
shows data relating to the deaths of women m childbirth in certain groups 
of districts. 

The values of Vs 2 ~,s “ 0 2 suggest an almost uniform value of , about 
0 * 8 , in the deaths of women per 1000 births, i.r. that in each of the 
categories “ number of births in the decade ” there is real variability in 
the chances of individual women succumbing. 



864 


THEORY OF STATISTICS. 


Table 19.1. —Showing Frequencies of Registration Districts in England and Wales 
with Different Proportions of Deaths in Childbirth (including Deaths from Puapetal 
Fever) per 1000 liirths in the Same Year. (Data from Decennial Supplement to 
Fifty-fifth Annual Report of Registrar-Geneial for England and Wales. Decade 
1881-90.) 


Number of Buths m the Decade. 


Deaths in 

— 

— 

_ 

— 

— 

-- - 

- 

Childbirth per 
1000 Births. 

1500 

to 

3500 

to 

4500 

to 

10,000 

to 

15,000 

to 

30,000 

to 

50,000 

to 


2500. 

4000 

5000 

“ 

15,000. 

20,000 

50,000 

__ 

90,000 

J 5- 2*0 



2 





2 0-25 

2 5- 3 0 

3 0- 3 5 

1 

1 

1 

» 

5 

1 

1 

2 

1 

4 


{ 

2 

3 5- 4 0 

5 

6 

5 

8 

f> 

5 

9 

4 0-45 

0 

5 

8 

23 

4 

9 

0 

4 5-50 

2 

5 

9 

14 

11 

7 

5 

5 0- 5*5 

7 

3 

6 

14 

0 

8 

7 

5 5- 6 0 

5 

3 

4 

5 

2 

5 

4 

6 0- 0 5 

1 

5 

1 

— 

4 

1 

l 

0 5-70 

8 

1 

1 

3 

- 

o 

1 

7*0- 7*5 

1 

L 

- 


-- 

4 


7*5- 8*0 

8 0- 8*5 

- 

— 

- 

I 


1 

1 

8 5- 9 0 

9 0- 9 5 

9 5-100 

1 

I 

1 

1 

1 

- 

1 

1 

1 


1 

10 0-10 5 

10 5-110 

1 


- 

! 

1 


■ 

Total 

30 

38 

40 

73 

33 

43 

35 

Mean 

5*29 

4 71 

4*45 

4 08 

4 99 

5 13 

4 04 

Standaid de 
viation j 

1 77 

1*37 

109 

1 01 

0 99 

1 12 

0 87 

Theoret i cal \ 








standard de-1 
viation con c j 

1 62 

1 12 

0 97 

0 01 

0 53 ! 

0 30 

0 20 

spending to 
mean buths ! 








Vs* - v 

0 71 

0 80 

0 51 

0 80 

0 84 

3 07 

0 83 


The figures of this ease also bring out dearly one important consequence 
of (19.8), viz. that if we make n large, s becomes sensibly equal to <j v , 
while if we make n small, .v becomes more nearly equal to p Q q 0 /n. Hence, 
if we want to know the significant standard deviation of the proportion p 
—the measure of its fluctuation owing to definite causes- u should be 
made as large as possible ; if, on the other hand, we want to obtain good 
illustrations of the theory of simple sampling, n should be made small. 
If n be very large, the actual standard error may evidently become almost 
indefinitely large compared with the standard deviation of simple sampling. 
Thus during the twenty years 1855 74 the death-rate in England and Wales 
fluctuated round a mean value of 22-2 per thousand witli a standard 






SAMPLING OF ATTRIBUTES—LARGE SAMPLES. 365 

deviation (s) of 0*86. Taking the mean population as roughly 21 millions, 
the standard deviation of simple sampling (s 0 ) is approximately 

122 x 978 

A ^ ts =0*032 per thousand 
\l 21 xl0° 1 

This is only about one twenty-seventh of the actual value. 

19.32. Now consider the effect of altering the second condition of 
simple sampling dealt with in 19.21, viz. the circumstances that regulate 
the appearance of the character observed shall be the same for every 
individual or every sub-class in each of the universes from which samples 
are drawn. Suppose that in a group of n dice thrown the chances for 
w?j dice are p x q x ; for m 2 dice, p 2 q 2 , and so on, the chances varying for 
different dice, but being constant throughout the experiment. The case 
differs from the last, as in that the chances were the same for every die, 
at any one throw, but varied from one throw to another; now they are 
constant from throw to throw, but differ from one die to another as they 
would in any ordinary set of badly made dice. Required to tind the effect 
of these differing chances. 

For the mean number of successes we evidently have: 

M =m 1 p 1 + w? 2 Pa + m. s p 6 + . . . 


- n Po 

p 0 being the mean chance S( tnp)jn . To find the standard deviation of the 
number of successes at each throw, it should be noted that this may be 
regarded as made up of the number of successes in the v\ t dice for which the 
chances arep^/j, together with the number of successes amongst the ?n 2 dice 
for which the chances are p 2 q 2 , and so on ; and these numbers of successes 
are ail independent. Hence, 

a * =- m iPi ( h ' + ™:ilh<h + ■ ■ ■ 

- S(«W) 

Substituting 1 ~p for q 9 as before, and using cr v to denote the standard 
deviation of p, 

.... (19.9) 

or if s be, as before, the standard error of the proportion of successes, 

s a = ?o?»_ <T s! .... (19,10) 

n n 

lienee, in ihis ease the standard error s is less than the standard error 
of simple sampling. 

19.33. The extent to which the standard error is affected may con¬ 
ceivably be considerable. To take a limiting case, if p lie zero for half the 
e\ ents and unity for the remainder, p^^qo- and <j v so that s is zero. 
To take another illustration, still somewhat extreme, if the values of p 
are uniformly distributed over the whole range between 0 and 1, /> 0 = </ 0 - } 
as before, but o v l = 1/12 =0*0833 (8.14, p. 143). Hence, * 2 =0-1667/n, 
s *=*0*408/Vn, instead of 0-5/Vn, the value of s if the chances are £ in every 



366 


THEORY OF STATISTICS. 


case. In most practical cases, however, the effect will be much less. Thus 
the standard deviation of simple sampling for a death-rate of, say, 12 per 
thousand in a population of uniform age and one sex is (12 x988 )t/Vn 
-109 jVn, In a population of the age composition of that of England 
and Wales, however, the death-rate is not, of course, uniform, but varies 
from a high value in infancy (say 64 per thousand), through very low 
values (2 to 3 per thousand) in childhood to continuously increasing values 
in old age ; the standard deviation of i ho rate within such a population 
is roughly about 24 per thousand. But the effect of this variation on the 
standard deviation oi simple sampling is quite small, for, as calculated from 
equation (19.10), 

4 - 2 - * (12 x 988 -570) 
n 

- 106 / \ n 

as compared with 101 )/ Vn. 

19.34. We have, finally, to pass to the condition referred to in 19.23, 
and to discuss the effect of a certain amount of dependence between the 
several “ events ” in each sample. We shall suppose, however, that the 
two other conditions are fulfilled, the chances p and q being the same for 
every event at every trial, and constant throughout the experiment. The 
standard deviation for each e\ent is {pqY as before, but the events arc no 
longer independent; instead, therefore, of the simple expression 

cr 2 ~npq 

we must have (<;/’. 16.2, p. 297) 

cr 2 — npq 1-2 pq(t vl \r u -i ... -I r 2{ t . . .) 

where r J2 , r lt{ , etc. are the correlations between the results of the first and 
second, lirst and third events, and so on — correlations for variables (number 
of successes) which can only take the values 0 and 1 , but may neverthe¬ 
less be treated as ordinary variables. There are n(n-l )/2 correlation 
coefficients, and if, there forts r is the arithmetic mean of the correlations, 
we may write : 

<j*-npq\\+r(n-l)\ . . . (19.11) 

The standard deviation of simple sampling will therefore be increased or 
diminished according as the average correlation between the results of 
the single events is positive or negative, and the effect may be considerable, 
as a may be reduced to ze ro or increased to n(pq)K For the standard 
deviation of the proportion of successes in each sample we have the 
equation 

s 2 -^|1 +r(n -1)1 . . . (19.12) 

19.35. It should bo noted that, as the means and standard deviations 
for our variables arc all identical, r is the correlation coefficient for a table 
formed by taking all possible pairs of results in the n events of each sample. 

It should also be noted that Ihe case when r is positive covers the 
departure from the rules of simple sampling discussed in 19,30-19.31 $ 



SAMPLING OF ATTRIBUTES— LARGE SAMPLES. 


367 


for if we draw successive samples from different records, this introduces 
the positive correlation at once, even although the results of the events at 
each trial are quite independent of one another. Similarly, the case dis¬ 
cussed in 19.32-19.33 is covered by the ease when r is negative; for if 
the chances arc not the same for every event at each trial, and the chance 
of success for some one event is above the average, the mean chance of 
success for the remainder must be below it. The present ease is, however, 
best kepi distinct from the other two, since a positive or negative correlation 
may arise for reasons quite different from those discussed in 19.30-19.33. 

19.36. As a simple illustration, consider the important ease of sam¬ 
pling from a limited universe, c.g. of drawing n balls in succession from the 
whole number w in a bag containing prv white balls and qw black balls. 
On repeating such drawings a large number of times, we arc evidently 
equally likely to get a white ball or a black ball for the first, second or nth 
ball of the sample; the correlation table formed from all possible pairs of 
every sample will therefore tend in the long run to give just the same form 
of distribution as the correlation table formed from all possible pairs of 
the w balls in the bag. But from 13.32, page 2/57, we know that the 
correlation coefficient for this table is 1 '(70 1), whence 


7 c - n 
~ n P9 rc ., 

If n = 1, we have the obviously correct result that a - (pq)\ as in draw¬ 
ing from unlimited material; if, on the other hand, n ~7c, a becomes zeio 
as it should, and the formula is thus checked for simple eases. For draw ¬ 
ing 2 balls out of 4, a becomes 0’81(>(»p<y)* ; for drawing 5 balls out of 
10 , 0*7 15(iipg)i; in the ease of drawing half the balls out of a very large 
number, it approximates to (0 -5npq)\ or 0*707 {npq)K 

19.37. In the case of contagious or infectious diseases, or of certain 
forms of accident that are apt, if fatal at all, to result in wholesale deaths, 
r is positive, and if n be large (as it usually is in such cast's), a very small 
value of r may easily lead to a very great increase in the observed standard 
deviation. It is difficult to give a really good example from aet ual statistics, 
as the conditions are hardly ever constant from one year to another, but the 
f ollowing will serve to illustrate the point. During the twenty years 1887 - 
1906 there were 2107 deaths from explosions of firedamp or coal-dust in the 
coal-mines of the United Kingdom, or an average of 105 deaths per annum. 
From 19.15 it follows that this should be the square of the standard 
deviation of simple sampling, or the standard deviation itself approxi¬ 
mately 10-3. But the square of the actual standard delation (the 
standard error) is 7178, or its value 84*7, the numbers of deaths ranging 
between 14 (in 1903) and 317 (in 1894). This large standard deviation, to 
judge from the figures, is partly, though not wholly, due to a general 
tendency to decrease in the numbers of deaths from explosions in spite of a 
large increase in the number of persons employed; but even if w r e ignore 
this, the magnitude of the standard deviation can be accounted for by a 
very small value of the correlation r, expressive of the fact that if an 
explosion is sufficiently serious to be fatal to one individual, it will probably 



868 


THEORY OF STATISTICS. 


be fatal to others also. For if o- 0 denote the standard deviation of simple 
sampling, a the standard deviation of sampling given by equation (19.11), 
we have: 

C7 2 - C7 0 2 

r ~ (n -1 )ff„ a 


Whence, from the above data, taking the numbers of persons employed 
underground at a rough average of 560,000, 


7078 

560,000 x 105 


+ 0-00012 


19.38. Summarising the preceding paragraphs, 19.30-19.37, we see 
that if the chances p and q differ for the various universes, districts, years, 
materials, or whatever they may lie from which the samples arc drawn, 
the standard deviation observed (the standard error) will be greater than 
the standard deviation of simple sampling, as calculated from the average- 
values of the chances ; if the average chances are the same for each universe 
from which a sample is drawn, but vary from individual to individual or 
from one sub-class to another within the universe, the standard deviation 
observed (the standard error) will be less than the standard deviation of 
simple sampling as calculated from the mean values of the chances ; finally, 
if p and q are constant, but the events are no longer independent, the 
observed standard deviation (the standard error) will be greater or less 
than the simplest theoretical value according as the correlation between 
the results of the single events is positive or negative. These conclusions 
further emphasise the need for caution in the use of standard errors. If we 
find that the standard deviation in some ease of sampling exceeds the 
standard deviation of simple sampling, two interpretations are possible : 
either that p and q are different in the various universes from which samples 
have been drawn (i.e. that the \ until ions are more or less significant), or 
that the results of the events arc positively correlated inter sc. If the 
actual standard deviation fall short of tlie standard deviation of simple 
sampling two interpretations are again possible : cither that the chances p 
and q vary for different individuals or sub-classes in each universe, while 
approximately constant from one universe to another, or that the results 
of the events are negatively correlated inter <sr. Even if the actual standard 
deviation approaches closely to the standard deviation of simple sampling, 
it is only a conjectural and not a necessary inference that all the conditions 
of 44 simple sampling ” are fulfilled. Possibly, for example, there may be a 
positive correlation r between the results of the different events, masked 
by a variation of the chances p and q in sub-classes of each universe. 

An Alternative Approach. 

19.39. The results of this chapter have been studied from a rather 
different point of view by a continental school of statisticians, among whose 
names those of Lexis and Charlier are prominent. 

Lexis considers a number of samples of n individuals in which the 
proportions of successes observed are p v p 2i . . . and sets himself 
to investigate the nature of the universe from which they were drawn— 
whether it is homogeneous and the samples may be regarded as obtained 
by simple sampling, whether it varies in time or place so that the samples 



SAMPLING OP ATTRIBUTES—LARGE SAMPLES. 


tm 


are not simple, and so on. He takes p to be the mean ol* the observed 
values Pi .. . Pn> and writes : 


0*67449 


o\/ : 


Vi 

n 


He then defines 


R -0*67449 


' N -1 


where the summation extends over all values of p x 

R 

i 


Q 


p Ni and writes 


19.40. Now, if the sampling is simple we may, in large samples, take 
the mean p to be an estimate of the true value, and r to be an estimate of 
the probable error of simple sampling of p. Also, we may take the quantity 
R to be an estimate of the probable error of p (see 23.5). 

lienee, for large samples, R is approximately equal to r, and Q~l. 
This ease, which is what we have called simple sampling, Lexis calls 
“ normal dispersion." 

19.41. On the other hand, if the universe is not constant while the 
samples are drawn, or if they come from different parts of a patchy universe, 
we get the ease discussed in 19.30. R is no longer an estimate of the 
probable error of a constant p, but may be split into two parts, one due to 
the sampling fluctuations of the observed values of p round the mean value, 
the other due to the variations of the true values round that mean. R will 
therefore be greater than r, as may be seen from equation (19.8), and 
Q * I. This ease Lexis calls supernormal dispersion. 11 

19.42. Similarly, in the ease discussed in 19.32 we get R less than r f 
and lienee Q 1. This ease Lexis calls ““ subnormal dispersion,” and 
speaks of tlu* data which give use to it as constrained 15 (gebundene). 

The quantity Q is analogous to a quantity y 2 , which we shall consider 
at some length m Chapter 22 in discussing the significance of the deviations 
of observed frequencies from theoretical expectation. 


SUMMARY. 

1. Under simple sampling conditions, the proportion of successes in a 
sample may be taken as an estimate of tlu* proportion of successes in the 
parent universe. 

2. If p is the proportion of successes in the universe, the standard error 
of simple sampling of the number of successes is given by 

cr - Vnpq 

and of the proportion of successes by 


3. The probability that an observed number of successes deviates from 
the expected number by more than three times the standard error is very 

24 



370 


THEORY OF STATISTICS. 


small. Tliis fact enables us to set limits to the range within which the 
observed frequency lies when we know the theoretical frequency. 

4. For large samples, the observed frequency of successes may be used 
to calculate the standard error, and this fact enables us to set limits to 
the range within which the theoretical frequency lies when wc know the 
observed frequency. 

5. For several samples, if the chance of success varies from sample to 
sample but remains constant within a sample, the standard error of the 
number of successes is given by 

v" — n Po ( ]» I »(» - ' )°„ 2 
and of the proportion of successes l>y 

2 _/W/0 - 1 „ 2 

o — "t U p 

n n 

where p 0 is the mean of the varying chance of success, is the standard 
deviation of p, and n is the number of individuals in each sample. 

If n is large and ,v 0 is the standard deviation calculated from the mean 
p Q , this last equation is approximately 


6. If the chance of success varies between the individuals of a sample 
but does not vary as between the different samples, 

" 2 - >'/V7o ~ 

c 2 IWO 2 
S n ~ n 

7. If the chance of success remains constant for each member of each 
sample, but the events are not independent, 

a 2 -- npq{\ tr(w-l)} 

; J{1 +r(n -1)} 

where r is the mi'iin of the correlations between the results of the events. 


EXERCISES. 

yiu.l. (Ref. (308): total of columns of all the 18 tables given.) 

Compare the actual with the theoretical mean and standard deviation for 
the following record of 6500 throws of 12 dice, 4, 5 or 0 being reckoned as a 
“success” :— 


Successes. 

Frequency. 

Successes. 

Frequency 

0 

1 

7 

J 351 

1 

14 

8 

844 

2 

103 

9 

391 

3 

302 

10 

117 

4 

711 

Jl 

21 

5 

1231 

12 

3 

6 

1411 


Total 0500 



SAMPLING OF ATTRIBUTES- LARGE SAMPLES. 371 

(Qtietelet, “Lcttres . . . sur la th eerie des probabilites.”) 

Balls were drawn from a bag containing equal numbers of black and white 
balls, each ball being returned before drawing another. The records were then 
grouped by counting the number of black balls in consecutive 2’s, 3’s, 4’s, 5’s, 
etc. The following are the distributions so derived for grouping by 5’s, G’s, 
and 7’s. Compare actual with theoretical means and standard deviations. 


Successes. 

(a) Grouping 
by Fives 

(b) Grouping 
by Sixes 

j (() Grouping 
| by Sevens 

0 

30 

17 

9 

I 1 

12 r > 

(>~> 

3-t 

2 

277 

100 

104 

3 

221 

192 

151 

4 

130 

100 

148 

5 

27 

09 

1 95 

0 

— 

3 

1 40 

7 

— 


4 

Total 

810 

683 

| 585 


19.3. The proportion of successes in the data oi Exercise 19.1 is 0 5097. 
Find the standard deviation of the proportion with the given number of throws, 
and state whether you would regard the excess of su( c< sses as probably significant 
of bias in the dice. 

19.4. In the 4090 drawings on which Exercise 19.2 is based 2030 balls were 
black and 2006 white. Is this dhergenee probably significant of bias? 

19.5. (Data from Report 1, Evolution Committee of the Royal Society, p. 17.) 
In breeding certain stocks, 408 hairy and 120 glabrous plants were obtained. 
If the expectation is one-fourth glabrous, is the divergence significant, or might 
i1 have occurred as a 11 actuation of sampling? 

19.0. 400 eggs are taken at random from a huge consignment, and 50 are 
found to be bad. Estimate the percentage of bad eggs m the consignment and 
assign limits within which the percentage probabh lies. 

19.7. In a certain association table (data from Exercise 3.5) the following 
frequencies were obtained: — 


{AB) "309, (AP) = 214, (oB) =132, (ap) = 119 


Can the association of the table have arisen as a fluctuation of simple sampling, 
the true association being zero? 

19.8. The sex ratio at birth is sometimes given by the ratio of male to female 
births, instead of the proportion of male to total births. If Z is the ratio, i.e. 


Z 


—/>/</, show that the standard error of Z is approximately (1 I Z) 



n being large, so that deviations are small compared with the mean. 

f 19.9. In a random sample of 500 persons (rom town A, 200 are found to be 
consumers of cheese. In a sample of 400 from town 13, 200 are also found 
to he consumers of cheese. Discuss the question whether the data reveal a 
significant difference between A and R so far as the proportion of cheese- 
consumers is concerned. 

19.10. In a newspaper article of 1000 words in English 30 per cent, of the 
words are found to be of Anglo-Saxon origin. Assuming that simple sampling 
conditions hold, estimate the proportion of Anglo-Saxon words in the writer's 
vocabulary and assign limits to that proportion. 

Suggest possible causes which might break down the three conditions for 
simple sampling. 





372 


THEORY OF STATISTICS. 


19.11. If a series of random samples of different sizes is taken from the same 
material, show .that the standard deviation of the observed proportions of 
successes in such sets is s, where 


and H is the harmonic mean of the numbers in the samples. 

19.12. Apply the result of the previous exercise to the following data 
(A. D. Darbisliire, Bwnuiriha> vol. 3, p. 30), giving percentage's to the nearest 
unit of albinos obtained in 121 litters from hybrids of Japanese waltzing mice 
by albinos, crossed inter se : 


Percentage. 

Frequency. 

Percentage. 

Frequency, 

0 

40 

K) 

3 

14 

4 

43 

2 

17 

9 

50 

16 

20 

9 

57 

1 

22 

1 

60 

8 

25 

10 

67 

4 

29 

3 

80 

1 

33 

13 

100 

2 


Calculate the actual standard deviation and compare it with the result given by 
the formula of the previous exercise. The expected proportion of albinos is 
25 per cent., and the sizes of the litters are given in Example 7.5, page 130. 

19.13. In a ease of miee-breeding (see reference above) the harmonic mean 
number in a litter was 4-735, and the expected proportion of albinos 50 per cent. 
Find the standard deviation of simple sampling for the proportion of albinos in a 
litter, and state whether the actual standard deviation (21 (53 per cent.) probably 
indicates any real variation, or not. 

19.14. In the data of Tabic 11.0, page 202, the standard deviation of the 
proportion of male births per 3 000 of all births is T-IO and the mean proportion 
of male births 509*2. The harmonic mean number of births in a district is 5070. 
Find the significant standard deviation ex,,. 

19.15. If for one half of v ev ents the chance of success is p and the chance of 
failure q, whilst for the other half the chance of success is q and the chance of 
failure p , what is the standard deviation of the number of successes, the events 
being all independent? 

19.16. The following are the deaths from smallpox during the twenty years 
1882-1901 in England and Wales: - 


1882 

1317 

1892 

431 

83 

957 

93 

1457 

84 

2234 

94 

820 

85 

2827 

95 

223 

86 

275 

96 

511 

87 

506 

97 

25 

88 

1026 

98 

253 

89 

23 

99 

174 

90 

16 

1900 

85 

91 

49 

1901 

356 


The death-rate from smallpox being \cr\ small, the rule of 19.15 may be 
applied to estimate the standard deviation of simple sampling. Assuming that 
the excess of the actual standard deviation over this can be entirely accounted 
for by a correlation between the results of exposure to risk of the individuals 
composing the population, estimate r. The mean population during the period 
may be taken in round numbers as 29 millions. 



CHAPTER 20. 


THE SAMPLING OF VARIABLES- LARGE SAMPLES. 
Sampling of Variables. 

20.1. We are now able to proceed from the sampling of attributes to 
the sampling of variables. W liereas in the last chapter we were interested 
m the question whether a member of a sample did or did not exhibit a 
particular attribute, we now have to stud} individuals which may take any 
of the values of a variable. It will no longer lie possible, therefore, for us 
to classify each member of a sample under one of two heads, success or 
failure; in general the values of the \anate given by different trials will 
be spread over a range, which may be unlimited, limited by practical 
considerations, as in the 1 ease of height in human beings, or limited b}f 
theoretical considerations, as in the ease of the correlation coeifieient, 
which cannot he outside the range 4 1 to ~1. 

20.2. To give concreteness to our discussions we shall occasionally find 
it useful to consider the sampling of variables as a kind of ticket sampling. 
We may picture our universe as made* up of tickets, each bearing a recorded 
value of some variable* A\ Sampling may then be imagined to consist of 
the drawing of tickets and the noting of the values of A" which they bear. 
In the great majority of eases with which we shall deal, A’ may have any 
value* over a continuous range, and the tie*ket universe is to he* conceived 
as being actually or practically infinite. 

20.3. As in the* e*asc of attributes, our principal objects m studying 
these samples will he (a) lo compare observation with expectation and to 
see hew far deviations of one from the olher can he athibuted to fluctua¬ 
tions of sampling ; (b) to estimate fre>m sample s some characteristic of the 
parent, such as the mean of a variate ; and (c) lo gauge the reliability of 
our estimates. 

In order to grasp satisfactorily the ideas anel assumptions upon which 
work of this kind is based, it is necessary to develop some theoretical 
considerations which have already been touched upon in the last chapter. 
This we now proceed to do. 

Sampling Distributions. 

20.4. If we take a number of samples from a universe and calculate 
some function, 1 such as the mean or the standard deviation, of each sample, 
we shall in general get a series of different values, one for each sample. If 
the number of samples is at all large, these values may be grouped in a 
frequency distribution ; and as the number of samples becomes larger, 
this distribution will approach the “ ideal ” form of a continuous curve. 
Such a distribution is called a sampling distribution. 

1 Quantities such us means, standard deviations, moment'*, correlation eoeHieients 
and so forth will he referred lo generieally as “parameters. 1 * 

373 



374 


THEORY OF STATISTICS. 


20,5. As an illustration, consider the universe of 8585 men, classified 
according to height, of Table 6.7, page 9 k In Chapter 18 we showed how 
to draw a random sample of 10 individuals from this universe, and for one 
sample we calculated the mean. The following table shows the 100 values 
of the sample mean obtained by taking 100 such samples arranged in the 
form of a frequency tabic 

Table 20.1. —Frequency Dislnbulion of Means of Samples of 10 from the Universe 
of the last (olumn of Table 6.7, pat*t 94 . 


Value of Mean m 

Numbei of Sample's with 

Sample (inches) 

Specified Values of 

h ss inch 

the Mean 

64 1- 

1 

1 

64 8 


65 2 

1 * 

65 6 - 

1 1 

66 0- 

12 

66 4~ 

16 

06 8- 

22 

67 2 

IS 

67 0- 

I 14 

68 0- 

; 4 

68 4- 

; _ 1 __ _ 1 

Total 

100 


This distribution is not very regular, owing to the smallness of the total 
frequency. 

20.6. As a second illustration we take some data obtained by random 
sampling with Tippett’s numbers from a bivariate normal universe with 
correlation +0*9. 500 samples of 10 were taken and t he correlation coelfi- 

cient of each sample worked out. The frequency distribution of the 
500 values was as follows (data adapted from 1\ R. Ruler, “ Distribution 
of Correlation Coefficient in Small Samples,” Bionut)ika, vo). 21, 1932, 

p. 882): - 

Tabi.t 20.2.— Frequenri/ Distribution of Correlation Coefficients in Samples 
of JO from a Normal Union se. 


of r m Sample j 

J 

Frequent y. 

0 1-0 0 

2 

0 0-0 J 

0 

0 ]~0 2 

0 

o 2-0 a 

2 

0 3-0 i 

4 

0 f 0 5 

7 

0 5-0 b 

30 

0 0 0 7 

4\ 

0 7 0 8 

102 

0 8 0 9 

178 

0 9-1-0 

131 

Total 

500 





SAMPLING OP VARIABLES—LARGE SAMPLES. 


375 


Here the distribution is more regular, the number of samples being five 
times as large. In general we expect that as Lhc number of samples 
increases, the distribution will tend more and more to a continuous curve. 

Use of the Sampling Distribution. 

20.7. Let us suppose that we are given the sampling distribution 
of a parameter, and that the frequency (?/) may be represented in terms 
of the variate (x) by a continuous curve, 

y F(x) 

The frequency with which a given \alue x 0 of the parameter occurs in 
a large number of samples will be represented by the ordinate of the 
curve at the point whose abscissa is «r„. We ha\ e had an example of 
this in the normal curve. 

The number ol‘ samples which give a value of x greater than t r 0 will be 
represented by the area to the right of the ordinale at x 0 ; the number 
giving a value less than <r 0 will be represented bv the remaining area to 
the left. 

lienee, the chance that any sample chosen at random from all possible 
samples will give a value of or greater than <r 0 is gi\en by the urea to the 
right of the ordinate at ,r 0 divided by the total area of the curve, which 
represents the total number of samples ; and the chance that the sample 
will gi\e a value of or less than x {) is given by the area to the left of the 
ordinate of x {) divided by the total area. 

Similarly, the (‘bailee that a sample would give a \alue of x lying 
between, say, and x 2 is the area lying between the ordinates at the points 
a\ and <r.> divided by the total area. 

20.8. In 10.2i we referred to the fact that areas could he expressed 
in the notation of the integral calculus. In fact, wc may wiite the area 
of th(' curve between x } and x 2 as 

f*2 

J F(x)dx 

Jj-i 

and hence we may express P, the probability that a sample will give a 
value between oc x and .r 2 , as 



where wc assume the extreme limits to be f x as in the normal curve. 
In particular, the probability that the sample will gi\e a value of or greater 
than x 0 is given by * 



As a rule, we can choose our units so that the area of the curve is unity. 
This simplifies the above expressions ; for the denominator, being equal 
to unity, may be omitted. 



370 


THEORY OF STATISTICS. 


20.9. Now let us suppose that, knowing the form of the sampling 
distribution and hence being able to calculate P for any given # 0 , we take 
a sample and find that it gives a very low value of P. Wc are then faced 
with three possibilities: (ither a very improbable event has occurred; 
or the assumptions on which we obtained the sampling distribution were 
incorrect; or there is something wrong with our sampling technique. 
Which of these explanations we adopt is to some extent a matter of choice, 
but if we have tested our sampling, or on other grounds have no reason 
to suspect it, wc shall, as a rule, be led to query the hypotheses on which 
the sampling distribution was obtained. 

This, in effect, is what we did in the previous chapter. It so happens 
that in the simple sampling of attributes vie know that the exact form 
of the sampling distribution is N(q 4 ])) n f where p is the chance of success. 
Without examining this distribution too closely v.c can say that only a 
very small part of it lies outside the range Kim lienee, if we find a 
sample giving a value outside 1 1 he range j_3 V npq, we suspect the hypothesis 
on which the distribution was based ; and this, unless we prefer to suppose 
that our sampling was not in fact simple, leads us to suspect the value of 
p, which completely determines the sampling distribution. 

20.10. In the prc\ ious chapter we regarded the probability of a 
sample giving a value differing by more than iia from the mean value as 
so remote that in every case we should be justified in looking for some 
definite cause of the discrepancy. This is only a conventional range, 
based upon the empirical fa<\ that m most single humped universes it 
includes nearly all the members ; but it is a convenient one to take and 
we shall use it again below. For certain purposes, however, wc might 
be prepared to use a narrower range which, though not giving such a 
small probability that a sample lay outside it, yet indicated considerable 
improbability in the divergence of observation from expectation, and 
enabled us to criticise the validity of our hypotheses with some degree of 
assurance. We give one or two examples below. 

20.11. In practice nearly' all the sampling distributions wo have to 
consider are based on simple sampling. It is therefore convenient to 
speak briefly of a “ sampling distribution,*' meaning thereby a sampling 
distribution obtained under simple (and random) conditions. 

Example 20.1.—T \le sampling distribution of a parameter is a normal 
universe with mean 3 units and standard deviation 2 units. What is 
the probability that a sample will give a value of the parameter greater 
than G units ? 

Here the value G is three units, i.e. l-5cr, to the right of the mean. 
The required probability is therefore the area of tin* normal curve to the 
right of an ordinate l*5a to Ihe right of the mean, divided by the total 
area of the curve. 

This ratio can be obtained at once from Table 2 of the Appendix. 
We see, in fact, that ihe greater fraction of the area of the curve corre¬ 
sponding to ~ =1-5 is 0-93319. The smaller fraction is therefore 0 0GG81, 

which gives us the required probability. 

Example 20.2.— If the sampling distribution of a parameter is normal, 
with zero mean and standard deviation a, what is the value of the 



SAMPLING OF VARIABLES—LARGE SAMPLES. 877 

parameter such that the chances are 99 to 1 against a sample giving a 
value in excess of that value ? 

We have to line! x such that the area of the curve to the right of the 
ordinate at x is 0*01, or the area to the left (MM). 

From Appendix Table 2 : 

If X - 2*3, greater fraction of area ^-0*98928 
and if X 2 P „ „ „ -0-99180 

(J 

tJ 1 

lienee, by simple in1er]>olation the greater fraction is 0*99 if - — 2-83 

approximately, and hence the required value is 2*33cr. 

Example 20.3.- It very frequently happens Jn sampling inquiries 
that we are interested in the probability that a sample value exceeds a 
gn en value x 0 in absolute value , i.e. that it is greater than x 0 or less than 
We can ascertain this probability mthout much trouble from the 
ordinary table of areas of the normal eur\e if the distribution is normal. 
Consider, for instance, the data of Example 20.1. Here we found the 
probability that a sample would gne a value greater than l-5cr. If we 
want the probability that it would give a value greater than 15a in 
absolute value, we have: 

P Area to right of ordinate at l-5cr 
+ Area to left of ordinate at -L5a 

Since the curve is symmetrical, the two areas in question are equal, and 

P 2(1 ~(M>3319) 

-0*18302 


For convenience, however, we have given in Table 3 of the Appendix 

the a aloes of this probabilitv direct 1\ in terms of li . From this table 

a 

we have at once, for * 1-5, 

a 


P 0-18801 

the difference in the last place being due merely to our having multiplied 
by 2 m the former value of P a quantity which was rounded up to the 
nearest figure, whereas P in the latter ease w r as calculated more accurately. 

20.12. To apply the result s of 20.7 to 20.11 in practice for the purpose 
of discussing the universe from which the samples came, wc require to 
know two things : (a) What is the relation between the sampling dis¬ 
tribution and the parent distribution, and (b) what is the form, at least 
approximately, of the sampling distribution of a given parameter from a 
given universe ? 

20.13. If the sampling is to be of much use in enabling us to estimate 
the value of a parameter in the parent, we should expect most of our 
estimates to be somewhere near the mark, and only comparatively few to 
he very far from the true value of the quantity estimated ,* and further, we 



878 


THEORY OF STATISTICS. 


expect that, in general, the further the estimates arc from the truth the 
fewer there will be of them. 

To put this more formally, we expect that the sampling distribution 
will have a peak somewhere close to the value of the parameter which 
corresponds to the true value in the parent. If it does not, t he distribution 
is probably biased and our samples are likely to be misleading. 

The first desideratum in our sampling is, therefore, that it shall not lead 
to a biased distribution. We have seen in Chapter 18 the difficulties of 
eliminating bias in the sampling process itself. Where, therefore, the more 
practical considerations alluded to in that chapter impose no limitation, wo 
must use unbiased sampling; and this means that our sampling must be 
random. In this connection it must be remembered that we cannot judge 
from the samples themselves whether the sampling is random or not, 
though w r e may suspect it. Separate tests, or the use of some accredited 
method, are to be recommended where practicable. 

20.14. Knowledge of the form of the sampling distribution of a para¬ 
meter, even of an approximate kind, is by no means easy to secure. We 
saw 7 that m the ease of the simple sampling of attributes it was possible to 
deduce the sampling distribution in an exact form. We are not always in 
this fortunate position here—m fact, rard> so. The principal difficulties 
are : 

(a) The form of the parent universe frequently is unknown. 

(/>) Even if the form of the parent is know n, certain ol‘ its constants may 
be unknown ; for instance, we may know that a universe is normal but be 
ignorant of its mean and standard deviation. 

(e) If the parent is completely known, the form of the sampling dis¬ 
tribution can be deduced theoretically in certain circumstances, and in 
particular if the sampling is simple; but in practice the mathematical 
problems which arise usually are very complex, and e\en if they are 
tractable may be of no use owing to the enormous arithmetical labour 
involved in expressing a solution in serviceable 1 form. 

20.15. If the samples are small these difficulties are formidable, even 
for simple sampling. With large samples, however, we are able to make 
certain legitimate approximations and assumptions which greatly simplify 
the problem. For the rest of this chapter and in the next we shall be 
concerned solely with large' samples. 

Simple Sampling of Variables. 

20.16. We shall also bo thinking mainly in terms of simple sampling 
(19.3). It is unnecessary to recapitulate here the discussion of simple 
sampling which we gave in the previous chapter. The assumptions which 
we considered in 19.19 to 19.24 apply mutatis mutandis to the simple 
sampling of variables. 

(a) We assume that we are drawing from precisely the same record 
during the whole of the sampling ; if we picture our parent universe as a 
card universe, the chance of drawing a card with any given value A' is the 
same for each sample. 

(*) We assume not only that we are drawing from the same record 
throughout, but that each of our cards at each drawing may be regarded 
quite strictly as drawn from the same record (or from identically similar 



SAMPLING OF VARIABLES—LARGE SAMPLES. 


879 


records) : e.g. if our card record is contained in a series of bundles, wc must 
not make it a practice to take the first card from bundle number 1, the 
second card from bundle number 2, and so on, or else the chance of drawing 
a card with a given value of A", or a value within assigned limits, may not 
be the same for each individual card at each drawing. 

(c) We assume that the drawing of each card is entirely independent 
of that of every other, so that the value of X recorded on card 1, at each 
drawing, is uncorrelated with the value of X recorded on card 2, 8, 4, and 
so on. It is for this reason that we spoke of the record, in 20.2, as contain¬ 
ing a practically infinite number of cards, for otherwise the successive 
drawings at each sampling would not be independent: if the bag contains 
ten tickets only, bearing the numbers 1 to 10, and we draw the card bearing 
1, the average of the following cards drawn will be higher than the mean of 
all cards drawn ; if, on the other hand, we draw the 10, the average of the 
following cards will be lower than the mean of all cards— i.e. there will be 
a negative correlation between the number on the card taken at any one 
drawing and the card taken at any other drawing. Without making the 
number of cards in the bag indefinitely large, we can, as already pointed out 
for the ease of attributes, eliminate this correlation by replacing each card 
before drawing the next. 

Approximations in the Theory of Large Samples. 

20.17. We can now consider the approximations which are possible 
m the theory of large samples. 

In the first place, since we have supposed bias to be eliminated, the 
sample values of a parameter will he grouped about the true value, and 
if the samples are large, will dilfer by comparatively small quantities 
from that value, lienee, we may take a sample value as an estimate 
of the true value. That is to .say, if we have a large sample (which may 
consist of a number of samples run together), we may calculate the para¬ 
meter from it precisely as we should proceed if we were calculating the 
parameter for the universe as a whole, and take that value as our estimate. 
Thus, the mean of the sample may be taken as an estimate of the mean of 
the universe. 

20.18. This rule is not quite so obvious as it appears. Suppose, for 
example, that wc are estimating the standard deviation of a universe. 
In accordance with the previous paragraph we should take the standard 
deviation of the sample, llut in calculating this quantity we should have 
to use deviations, not from the true mean, but from the mean in the sample, 
which may differ from the true mean and to that extent affect the value 
of the estimate. We shall, in fact, see later that if x L > .r 2 , . . . x n are 
the values in the sample and their mean, there are reasons for preferring 

1 . 1 

the estimate 6* 2 ~ S(a?—*r) 2 to the estimate s 2 ~~-SLr-x) 2 for the 

n - 1 n 

variance. If n is large, however, the difference is unimportant; w r e can 
ignore it until we come to deal with small samples. 

20.19. Secondly, as in the ease of attributes, we can use these 
estimates in calculating the constants of the sampling distribution, since 
they differ only by small quantities from the real values. We saw, for 
instance, that we w r ere justified in taking the value of p in a large sample 



880 


THEORY OF STATISTICS. 


in calculating the standard deviation V npq of the sampling distribution. 
We shall find that the standard deviation of the sampling distribution of 
the mean of samples from a normal universe involves the standard devia¬ 
tion of the parent ; and in this ease we can evaluate that cpiantity by using 
the standard deviation of the sample in place of the unknown standard 
deviation of the parent. 

20.20. Finally, it is a very remarkable fact that the sampling dis¬ 
tributions of many parameters, obtained under simple sampling conditions, 
tend for large samples to a single-humped form either exactly or very 
closely normal. The evidence for this statement is partly theoretical, 
partly experimental. It mav be shown that, for simple samples from a 
normal universe, the sampling distributions of most parameters are exactly 
normal lor large samples— some, in fact, are normal for small samples. 
Following up this work, a number of experiments has been carried out on 
universes which are not normal ; and it appears that the parent can deviate 
quite markedly from the normal form without affecting the normality of 
the sampling distribution to any great extent provided, as before, that the 
samples art* large. 

In most of our work we shall not require to assume that the sampling 
distribution is normal. It will be sullicient to assume that a range of tia 
oil each side of the mean includes the major portion of the distribution, 
and we can confidently take this to be so unless the parent exhibits very 
marked skewness. 

20.21. It will now be apparent that the difficulties we specified in 
20.14 have to a great extent been met. Provided that we know the 
parent distribution to be not unduly skew, we need not know its exact 
form; and the sampling distribution can be represented satisfactorily, if 
not exactly specified, by a mean and standard deviation which may be 
estimated from the data of the sample. 

Standard Error. 

20.22. As in the last chapter, we shall refer to the standard deviation 
of the sampling distribution as the standard error. In most eases we 
shall be dealing with simple sampling distributions, but it is convenient 
to use the term in this wider sense, although the word “error” is not 
altogether appropriate in some instances. In general, as we have seen, 
we are justilied in taking a range of times the standard error as deter¬ 
mining limits outside which the value of the parameter given by a sample 
probably does not lie. Wc can therefore use the standard error, as we 
have already used it for attributes, to gauge the precision of an estimate 
or to permit a judgment being made of the divergence between expected 
and observed values. 

In the remainder of this chapter, and in the next, we shall therefore 
be concerned mainly in finding expressions for the standard errors of 
the various parameters which we have to estimate. Their use wc shall 
illustrate in examples as we go along. In certain eases we shall also 
consider the effect of a breakdown in the conditions of simple sampling. 

Standard Error of a Percentile, Quartile and Median. 

20.23. Let us first of all consider the ease of percentiles, which is 
intimately related to that of attributes. 



SAMPLING OF VARIABLES—LARGE SAMPLES. 


881 


Consider the distribution of a variate X in an indefinitely large sample. 
(This is not necessarily the same as the distribution in the parent, owing 
to the possible presence of bias ; but if bias is excluded, and the sampling 
is simple, it is the same as the parent form.) 

Let X p be a value of X such that pN values of X in this distribution 
lie above it and qN below it. Thus, if the sampling is unbiased, p =* 
would give us the upper decile in the indefinitely large sample, the 
median, and so on. 

A sample of n will contain various values of X. Let the proportion 
of values above X r , be p f 8 ; and let € be the adjustment to be made in 
X p so that the proportion of values of X above + e is p, The values 
8 and e may be regarded as sampling fluctuations. 

Considering now the sample of a, we have that 


Hence, 


the proportion of values above X p —p +8 
■*» >» »> X v F € ~p 

8 = proportion of values bet ween X p and A' ?> +l 


Now if )i be large*, the proportion of values between X p and X v -M in 
the sample will, to a close approximation, be the proportion of values 
between those quantities in the distribution of an indefinitely large 
sample. Consider then this distribution and let the standard deviation 
of X in it be <j. If we take the distribution as drawn to scale with unit 
standard deviation and unit area, the proportion of values between X v 
and A ; , f e is the area of the curve between ordinates at the points 

A " and f • 

(7 (J 

Now if n be large, e will be small, for the value of a parameter in the 
sample of n will lie close to the value in the indefinitely large sample. 


X X f € 

lienee the area between v and *' 

(7 <J 

if we call the *' ordinate ij pi the area will be ij v x C 
Hence, 


is approximately rectangular, and 


£ !J j> 


or 


Vu 

Now 8 is the deviation of the observed pro)>ortions from the value p ; 
and from our study of attributes we know that the observed proportions 

p + 8 will centre round the mean p with standard deviation 

* n 

lienee 8 centres round zero mean with standard deviation Since 

' n 



882 


THEORY OF STATISTICS. 


c bears a constant ratio ° to S, it follows that e will be distributed about 

y» 

zero mean with standard deviation 


! Vq 


a, - a sj 

v y ,, n 


( 20 . 1 ) 


20.24. If the distribution in an indefinitely large sample be normal, 
we can take the values of y p from the tables of the ordinate of the normal 
curve (Appendix Table 1). From tables carried to further places of 
decimals we have, for the various values of p which correspond to the 
deciles, 

Value of y v . 


Median 

Deciles 1 and 6 

„ 8 and 7 

,, 2 and 8 

1 and 9 

Quart lies 


0*3989423 

0*3863425 

0-3470926 

0*2799619 

0*1754983 

0-8177706 


Inserting these values of y p in equation (20.1), we have the following 
values for the standard enrols of the median, deciles, etc. :— 

Mandate! enoi is 
a I \*w multiplied by 


Median. 1*25331 

Deciles 4 and 0 . .1 *26804 

„ 3 and 7 1*31800 

„ 2 and 8 . . . 1*42877 

1 and 9 . . . 1*70912 

Quartiles .... 1*36263 


It will be seen that the influence of fluctuations of sampling on the 
several percentiles increases as v\e depart from the median : the standard 
error of the quartiles is nearly one-tenth greater than that of the median, 
and the standard error of the first or ninth decile more than one-third 
greater. 

20.25. Consider furl her the mHucnee of the form of the frequency- 
distribution on the standard error of the median, as this is an important 
form of average. For a distribution with a given number of observations 
and a given standard deviation the standard error varies inversely as y v . 
Hence for a distribution in which y p is small, for example a U-shaped 
distribution, the standard error of the median will be relatively high, and 
it will, in so far, be an undesirable form of average to employ. On the 
other hand, in the ease of a distribution which has a high peak in the 
centre, so as to exhibit a value of y p large compared with the standard 
deviation, the standard error of the median will be relatively low. We 
can create such a “ peaked ” distribution by superposing a normal curve 
with a small standard delation on a normal curve with the same mean 
and a relatively large standard deviation. To give some idea of the 
reduction in the standard error of the median that may be effected by a 









SAMPLING OP VARIABLES — LARGE SAMPLES. 


383 


moderate change in the form of the distribution, let us find for what 
ratio of the standard deviations of two such curves, having the same area, 
the standard error of the median reduces to <7/ Vn , where a is of course 
the standard deviation of the compound distribution. 

Let ct 3 , <r 2 be the standard deviations of the two distributions, and let 
there be n /2 observations in each. Then 


/ CT 1 2 + (Jo 2 

a ~V 2 - 


( 20 . 2 ) 


On the other hand, the value of y v is 

.1 _J_ + 

2V , 27rcr a f 2 

Hence, the standard error of the median is 

J2tt or a g 8 

(20.4) is equal to a/Vn if 

(°i t 0's) ^ /fT i 2 ^ ^ 

2 n/ 

and writing crjcr l p, that is if 

Pj-p)/] t p 2 , 

2’V'iTp 

or 

p 4 -l 2p 3 + (2 - 4 tt)p 2 + 2p +1 -0 


( 20 . 8 ) 


(20.4) 


This equation may lx* reduced to a quadratic and solved by taking 
p ! ^ as a new \ amble. The roots found give p =2-2800 . . . or 

0-41-72 . . ., the one root being merely the reciprocal of the other. The 
standard error of the median will therefore be cr /Vn, in such a compound 
distribution, if the standard deviation of the one normal curve is, in round 
numbers, about 2} times that of the other. If the ratio be greater, the 
standard error of the median will be less than alVn. The distribution 
for which the standard error of the nudum is exactly equal to ajVn is 
shown in fig. 20.1 ; it will be seen that it is by no means a very striking 
form of distribution ; at a hasty glance it might almost be taken as normal. 
In the ease of distributions of a form more or less similar to that shown, 
it is evident that wc cannot at all safely estimate by eye alone the relative 
standard error of the median as compared with a/ Vn. 

20.26. In the ease of a grouped frequency-distribution in which the 
number of observations is large enough to give a fairly smooth distribution, 
we can use an alternative form which does not involve a knowledge of the 
standard deviation of the distribution in a very large sample. In fact, in 
such a case the sample itself is large enough to give us a satisfactory 




884 


THEORY OF STATISTICS. 


approximation to the distribution in an indefinitely large sample. Let f t , 
be the frequency per class-interval at the given percentile—simple inter¬ 
polation will give us the value with quite sufficient accuracy for practical 

purposes, and if the figures 
run irregularly they may 
be smoothed. Let cr be 
tin* value of the stan¬ 
dard deviation expressed in 
class-intervals, and let n 
be the number of obser¬ 
vations as before. Then, 
since if p is the ordinate of 
the frequency-distribution 
when drawn with unit 
standard deviation and unit 
area, wc must have 

(T r 

y„ - n fn 

But this gives at once for 
i host andard (rror exp) essed 
in turns oj the class-tutu t at 
as unit 

a r .(20.5) 

J P 



Fig. 20.1. 


Example 20.4 . Consider the data ol Table 0 7, page 01, giving the 
distribution of 8585 men according to height. Let us take these data to 
be a sample from the uim erso of men in the United Kingdom at that time. 
The number of observations is 8585, and the standard deviation 2-57 m., 
the distribution being approximately normal: a/Vn 0*027737, and. 
multiplying by the factor 1 -253 . . . gix en m the table in 20.24, this gives 
0-0318 as the standard error of the median, on the assumption of normality 
of the distribution. 

Using the direct method of equation (20.5), we find the median to be 
67*47 (7.20), which is very nearly at the centre of the interval with a 
frequency 1320. Taking this as being, with sufficient accuracy for our 
present purpose, the frequency per interval at the median, the standard 
error is 


a V 8585 
2 1329 


0 0319 


As wo should expect, the value is practically the same as that obtained 
from the value of the standard deviation on the assumption of normality. 

Three times the standard error is 0-1017, and we accordingly conclude 
that the median in the universe lies within abouL 0*1 inch of G7-47, the 
sample value, provided that the sampling is simple. 

Example 20.5. Let us find the standard error of the first and ninth 
deciles as another illustration. On the assumption that the distribution 



SAMPLING OF VARIABLES-LARGE SAMPLES. 


385 


is normal, these standard errors arc the same, and equal to 0*027737 
x 1-70942-'=0*0474. Using the direct method, we find by simple inter¬ 
polation the approximate frequencies per interval at the first and ninth 
deciles respectively to be 590 and 570, giving standard errors of 0*0171 
and 0-0488, mean 0*0479, slightly in excess of that found on the assumption 
that the frequency is given by the normal curve. The student should 
notice that the class-interval is, in this ease, identical with the unit of 
measurement, and consequently the answer given by equation (20.5) does 
not require to be multiplied by the magnitude* of the interval. 

Correlation between Errors of Percentiles. 

20.27. In finding the standard error of the difference between two 
percentiles in the same distribution, the student must be* careful to note 
that the errors in two such percentiles arc not independent. Consider the 
two percentiles for which the values of p and q are p x </,, p 2 q 2 , respectively, 
the first named being the lower of the two percentiles. These two per¬ 
centiles divide the whole area of the frequency curve into three parts, the 
areas of which are proportional to 1 -q x -]> 2 < and p 2 . Further, since 
the errors in the first percentile are directly proportional to the errors in q v 
and the errors in the second percentile are directly proportional but of 
opposite sign to the errors in p 2 , the correlation between errors in the two 
percentiles will be the same as the correlation between errors m q x and 
but of opposite sign. Hut if there be a deficiency ol observations below the 
lower percentile, producing an error S x m q l9 the missing observations will 
tend to be spread over the two other sections of the curve in proportion to 
their respective areas, and will therefore tend to produce an error 



in p 2 . If, then, r be tlit* correlation between errors m q t and p 2 , c x and e a 
the respective standard emus, we have : 

€ 2 V'l 

r - 

<i /h 

Or, inserting the values of the standard errors, 


r 


lP*<h 


The correlation between the percentiles is the same m magnitude but 
opposite m sign; it is obviously positive, and consequently 


Correlation between errors | 
in two percentiles J 


P'Jh 
' <l*lh 


( 20 . 6 ) 


If the tw r o percentiles approach very close together, q x and q, n p x and p 2 
become sensibly equal to one another, and the correlation becomes unity, 
as we should expect. 


Standard Error of Semi-interquartile Range. 

20.28. Let us apply the above value of the correlation between 
percentiles to find the standard error of the semi-interquartile range for the 
normal curve. Inserting q x =p 2 -= £, q» p x - w r e find r ~~ J. Hence the 

25 



386 


THEORY OF STATISTICS. 


standard error of the interquartile range is, applying the ordinary formula 
for the standard deviation of a difference, 2 jVft times the standard error 
of either quartile, or the standard error of the sem /-interquartile range 
1/Vs times the standard error of a quartile. Taking the value of the 
standard error of a quartile from the table in 20.24, we have, finally, 

Standard error of the semi- 
mterquartile range in a 
normal distribution 

Of course the standard deviation of the interquartile, or semi-inter¬ 
quartile, range can readily be worked out in any particular ease, using 
equation (20.5) and the value of the correlation given above ; it is best to 
work out such standard errors from first principles, applying the usual 
formula for the standard deviation of the difference of two correlated 
variables (16.2). 

20.29. If there is any failure of the conditions of simple sampling, 
the formula 1 of the preceding sections cease, of course, to hold good. We 
need not, however, enter again into a discussion of the effect of removing 
the several restrictions, for the effect on the standard error of p was con¬ 
sidered in detail in Chapter 10, and the standard error of any percentile is 
directly proportional to the standard error of p. 

\/Standard Error of the Arithmetic Mean. 

20.30. Let us now determine the standard error of the arithmetic mean. 

Suppose we note separately at each drawing the value* recorded on the 

first, second, third . . . and wth card of our sample. The standard 
deviation of the values on each separate card will tend in the long run to be 
the same, and identical with the standard dev iation u of jt in an indefinitely 
large sample, drawn under the same conditions. Further, the value 
recorded on each card is (as we assume) uneorrelated with that on every 
other. The standard deviation ol* the sum of the values recorded on the 
n cards is therefore Vna, and the standard deviation of the mean of the 
sample is consequently 1/nth of this ; or, 

Oru - ° .... ( 20 . 8 ) 

V y\ 

This is a most important and frequently cited formula, and the student 
should note that it has bcui obtained without any reference to the size of 
the sample or to the form of the frequency-distribution. It is therefore 
of perfectly general application, if a be known. We can verify it against 
our formula for the standard deviation of sampling in the ease of attributes. 
The standard dev iation of the number of successes in a sample of m observa¬ 
tions is Vrnpq : the standard deviation of the total number of successes 
in n samples of m observations each is therefore V nmpq : dividing by n we 
have the standard deviation of the mean number of successes in the n 
samples, viz. VmpqjVn , agreeing with equation (20.8). 

Example 20.6 .—In the height^ distribution considered in Examples 
20.4 and 20.5 we found that a/ Vn — 0*0277 approximately. This is then 
the standard error of the mean of the distribution. 




SAMPLING OF VARIABLES—LARGE SAMPLES. 


387 


If we regard the data as a simple sample from the universe of men in 
the United Kingdom, we may take the mean, i.r. 67*46 inches, as an 
estimate of the mean in the universe. Three times the standard error is very 
small, 0-088 inch, and we can therefore locate the mean in tlie universe 
with considerable accuracy. 

The standard error in this ease, however, gives a misleading idea as 
to the accuracy attained in determining the average stature in the United 
Kingdom ; the sample was not chosen under conditions which gave every 
individual an equal chance of being chosen. 

Comparison of the Standard Errors of the Median and the Mean. 

20 . 31 . For a normal curve the standard error of the mean is to the 
standard error of the median approximately as 100 to 125 (cf. 20 . 24 ), 
and in general the standard errors of the two stand in a somewhat similar 
ratio for a distribution not differing largely from the normal form. For 
the distribution of statures used as an illustration in Example 20.4, the 
standard error of the median was found to be 0-0340; the standard error 
of the mean is only 0-0277. The distribution being very approximately 
normal, the ratio of the two standard errors, viz. 1-26, assumes almost 
exactly the theoretical magnitude. 

As such cases as these seem on the whole to be more common and 
typical, we stated in 7.23 that the mean is in general less affected than 
the median by errors of sampling. At the same time we also indicated the 
exceptional cases in which the median might be the more stable—cases in 
which the mean might, for example, be affected considerably by small 
groups of widely outlying observations, or m which the frequency-distribu¬ 
tion assumed a form resembling tig. 20.1, but even more exaggerated 
as regards the height of the central “ peak ” and the relative length of 
the “ tails.” Such distributions arc not uncommon in some economic 
statistics, and they might be expected to characterise some forms of ex¬ 
perimental error. If, in these- eases, the greater stability of the median 
is sufficiently marked to outweigh its disadvantages in other respects, the 
median may be t he better form of average to use. Fig. 20.1 represents 
a distribution m which the standard errors of the mean and of the median 
are the same. Further, in some experimental eases it is conceivable that 
the median may be less affected by definite experimental errors, the average 
of which does not tend to be zero, than is the mean—this is, of course, a 
point quite distinct from that of errors of sampling. 

Means of Two Samples. 

20 . 32 . When we have two samples from some record which exhibit 
different means, a very common question which we wish to ask is: Can 
the difference be accounted for by sampling iiuctuations, i.r. can the two 
samples have come from the same universe Y 

If the two samples are independent and come from the same universe 
under simple conditions, evidently c 12 , the standard error of the difference 
of their means, is given by 

+ “) .... (20.9) 

If an observed difference exceed three times the value of e 12 given by 
this formula, it can hardly be ascribed to fluctuations of sampling. If, in 



388 


THEOKY OF STATISTICS. 


a practical case, the value of <7 is not known a priori , we must substitute 
an observed value, and it would seem natural to take as this value the 
standard deviation in the two samples thrown together. If, however, the 
standard deviations of the two samples themselves differ more than can 
be accounted for on the basis of fluctuations of sampling alone (see below, 
21.14), we evidently cannot assume that both samples have been drawn 
from the same record : the one sample must have been drawn from a 
record or a universe exhibiting a greater standard deviation than the 
other. If two samples be drawn quite independently from different 
universes, indefinitely large samples from which exhibit the standard 
deviations and cr 2 , the standard error of the difference of their means 
will be given by 


( 20 . 10 )^ 


This is, indeed, the formula usually employed for testing the significance 
of the difference between two means m any case ; seeing that the standard 
error of the mean depends on the standard deviation only, and not on the 
mean, of the distribution, we can inquire whether the two universes from 
which samples have been drawn differ in mean apart from any difference in 
dispersion. 

20.33. If two quite independent samples be drawn from the same 
universe, but instead of comparing the mean of the one with the mean 
of the other we compare the mean m x of the first with the mean m 0 of 
both samples together, the* use of (20.0) or (20.10) is not justified, for 
errors in the mean of the one sample are correlated with errors in the mean 
of the two together. Following precisely the lines of the similar problem 
in 19.29, w r e find that this correlation is \ / n 1 l(n i f n 2 ), and hence 


n i( n i + n z 


( 20 . 11 ) 


(For a complete treatment of this problem in the ease of large samples 
drawn from two different uimerses, cf. ref. (403).) 


Effect on Standard Error of Mean of Breakdown of Conditions for 
Simple Sampling. 

20.34. Let us consider briefly the effect on the standard error of the 
mean il the conditions of simple sampling as laid down in 20.16 cease 
to apply. 

If we do not draw' from the same record all the time, but first draw a 
series of samples from one record, then another series from another record 
with a somewhat different mean and standard deviation, and so on, or if 
we draw the successive samples from essentially different parts of the same 
record, the standard error will be greatly increased. 

For suppose we draw k x samples from the first record, for which the 
standard deviation (in an indefinitely large sample) is a v and the mean 
differs by d x from the mean of all the records together (as ascertained by 
large samples in numbers proportionate to those now taken), k 2 samples 
from the second record, for which the standard deviation is a 2 , and the 
mean differs by d 2 from the mean of all the records together, and so on. 



RAMPLING OF VARIABLES—LABOR SAMPLER. 


880 


Then for the samples drawn from the first record the standard error of the 
mean will be aj V n t but the distribution will centre round a value differing 
by d x from the mean for all the records together ; and so on for the samples 
drawn from the other records. Hence, if a tn be the standard error of the 
mean in all the records taken together, N the total number of samples, 

NoJ- s(/£) fS(*rf«) 

Hut the standard deviation <t 0 for all the records together is given by 

Ahr 0 2 S (/»*<r 2 ) + S ( kd 1 ) 


Hence, writing S (kd 2 ) -- Ns,, 2 , 


7 ft 2 n -1 
\ -s 2 


. ( 20 . 12 ) 


This equation corresponds precisely to equation (19.8), page 808. The 
standard error of the mean, if our samples are drawn from different records 
or from essentially different parts of the entire record, may be increased 
indefinitely as compared with the value it would have m the case of 
simple sampling, if, for example, >\e take the statures of samples of 
n men in a number of different districts of England, and the standard 
deviation of all the statures observed is er 0 , the standard deviation of the 
means for the different districts will not be arjVn, but will have some 
greater value, dependent on the real variation in mean stature from 
district to district. 

20.35. If we are drawing from the same record throughout, but 
always draw the first card from one part of that record, the second card 
from another part, and so on, and these parts differ more or less, the 
standard error of the mean will be decreased. For if, in large samples 
drawn from the subsidiary parts of the record from which the several 
cards are taken, the standard deviations art* o^, ct 2 , . . . cr tlf and the 
means differ by d x , d 2 , . . . d n from the mean for a large sample from 
the entire record, we have : 

<V - -t l s ( d ~) 

Hence, 

w ,,, 2 ( ' 2 s (" 2 ) 


The last equation again corresponds precisely with that given for the 
same departure from the rules of simple sampling in the ease of attributes 
(equation (19.10), p. 805). If, to vary our previous illustration, we 
had measured the statures of men in each of n different districts, and 
then proceeded to form a set of samples by taking one man from each 
district for the first sample, one man from each district for the second 
sample, and so on, the standard deviation of the means of the samples 
so formed would be appreciably less than the standard error ol‘ simple 



890 


THEORY OF STATISTICS. 


sampling oJVn. As a limiting case, it is evident that if the men in each 
district were all of precisely the same stature, the means of all the samples 
so compounded would be identical; in such a case, in fact, cr 0 ^,$* m , and 
consequently a m =0. To give another illustration, if the cards from which 
we were drawing samples had been arranged in order of the magnitude of 
X recorded on each, we would get a much more stable sample by drawing 
one card from each successive nth part of the record than by taking the 
sample according to our previous rules e.g. shaking them up in a bag 
and taking out cards blindfold, or using some equivalent process. 

The result is perhaps of some practical interest. It shows that, if we 
are actually taking samples from a large area, different districts of which 
exhibit markedly different means for the variable under consideration, and 
are limited to a sample of n observations, if we break up the whole area 
into n sub-districts, each as homogeneous as possible, and take a contribu¬ 
tion to the sample from each, we will obtain a more stable mean by this 
orderly procedure than will he given, for the same number of observations, 
by any process of selecting the districts from which samples shall be taken 
by chance. There may, however, be 1 a greater risk of biased error. These 
conclusions seem in accord with common sense. 

20.36. Finally, suppose that, while our conditions (a) and (b) of 
20.16 hold good, the magnitude of the variable recorded on one card 
drawn is no longer independent of the magnitude recorded on another card, 
e.g. that if the first card drawn at any sampling bears a high value, the next 
and following cards of the same sample arc likely to bear high values also. 
In these circumstances, if r u denote the correlation between the values 
on the first and second cards, and so on, 



There are n(n - l)/2 correlations; and if, therefore, r is the arithmetic 
mean of them all, w r c may write : 

cr^^ll *r(w-l)l • • • (20.14) 

As the means and standard deviations of x x , x 2 > . . . x n are all identical, 
r may more simply be regarded as the correlation coefficient for a table 
formed by taking all possible pairs of the n values in every sample. If this 
correlation be positive, the standard error of the mean will be increased, 
and for a given value of r the increase will be the greater, the greater the 
size of the samples. If r be negative, on the other hand, the standard error 
will be diminished. Equation (20.14) corresponds precisely to equation 
(19.12), page 366. 

As was pointed out in 19.35, the case when r is positive covers 
the case discussed in 20.34 ; for if we draw successive samples from 
different records, such a positive correlation is at once introduced, although 
the drawings of the several cards at each sampling are quite independent of 
one another. Similarly, the ease discussed in 20.35 is covered by the case 
of negative correlation, for if each card is always drawn from a separate 
and distinct part of the record, the correlation between any two a?’s will 
on the average be negative; if some one card be always drawn from a part 



SAMPLING OF VARIABLES—LARGE SAMPLES. 


391 


of the record containing low values of the variable, the others must on an 
average be drawn from parts containing relatively high values. It is as 
well, however, to keep the three cases distinct, since a positive or negative 
correlation may arise for reasons quite different from those considered in 
20.34 and 20.35. 


SUMMARY. 

1. A knowledge of the sampling distribution of a parameter enables us 
to ascertain the probability that a given sample will exhibit a value of the 
parameter between specified limits. 

2. The sampling distribution of many parameters tends to the normal 
form, or at least a single-humped form, for large values of n , the number in 
the sample, if the sampling is simple. 

3. This fact enables us to take a range of ± 3 times the standard error 
as providing limits within which a sample value of the parameter will 
probably lie ; with the further assumption of normality of the sampling 
distribution we can determine the probability that a sample value will lie 
within any specified limits. 

L In a large sample the values of parameters in the sample may be 
taken to lie estimates of the values in the universe, if the sample is simple. 
Further, these Millies may be used instead of the values in the universe in 
calculating the standard errors of the parameters. 

.5. The standard error of the median of a normal distribution is given by 

s.c. - I •2. r »381 

V 'll 


where a is the standard deviation in an indefinitely large sample and n 
is the number in the sample. 

C. With the same notation the standard error of the arithmetic mean is 

a 


whatever the form of the distribution. 

7. If a series of samples of n is drawn from different universe's or from 
different parts of a non-bomogeneoiis universe, 


/ J o 

n 


' n -1 


where u m is the standard error of the mean, cr 0 is the standard deviation 
in all the samples taken together, and s m is the standard deviation of means 
of indefinitely large samples about the mean of all samples. 

8. If samples are drawn so that each member comes from a different 
section of a non-homogeneous universe, 


V xJ 


n n 

where a m , cr 0 and s m are defined as before. 



892 


THEORY OF STATISTICS. 


9. If there is a correlation between the results of the drawing of 
successive individuals, 

oj "flfr(n-l)] 
n 

where o m is the standard error ot the mean, a the standard delation in 
an indefinitely large sample, and r is the mean correlation between the 
results of pairs of individuals. 


EXERCISES. 

20.1. If the sampling distribution of a parameter is normal, find the prob¬ 
ability that a sample value w ill difler from the central value by more than twiee 
the probable error. 

20.2. In the height distribution of the United Kingdom given in Table 0.7, 
page 94, assumed to be normal, with mean 07 10 inches and standard deviation 
2*57 inches, find the piobability that an individual chosen in the same way as 
the members of the distribution will lie between 5 and 0 feet in height. 

20.0. For the data of the last column of Exercise 0.0, page 111, find the 
standard error of the median (151 7 lbs.) and the standard errors of the two 
quartilcs (142 5 lbs. and 108*4 Jbs.). 

20.4. For the same distribution find the standard error of the semi-inter- 
quart ile range. 

20.5. The standard deviation of the same distribution is 21.0 lbs. Find the 
standard error of the mean and compare it with the standard error of the median 
(Exercise 20.0). 

20.G. Taking the values of the median and the quartilcs of the marriage 
distribution of Table 0.8, page 1)0, from Example 9.8, page 104, find then- 
standard errors. 

20.7. In the same distribution the mean is 29 4 years and the standard 
deviation 8 years, approximately. Find the standard error of the mean and 
compare it with that of the median. 

20.8. For the same distribution find the standard error of the quartilcs, 
assuming it to be normal with mean 29*4 years and standard deviation 8 years, 
and compare vour results with those obtained in Exercise 20.0. 

20.9. Find the standard error of the 27th percentile of the normal 
distribution. 

20.10. (Imaginary data.) A random sample of 1000 men from the North of 
England shows their mean wage to be £2 7s. per week, with a standard deviation 
of £1 8s. A sample of 1500 men fiom the South of England gives a mean wage of 
£2 9s. per week, w ith a standard deviation of £2. Discuss the suggestion that the 
mean rate of wages varies as betw-een the two regions. 

20.11. Two universes have tlie same mean but the standard deviation of 
one is twiee that of the oilier. Show that in samples of 500 from each drawn 
under simple random conditions the difference of the means will in all probability 
not exceed 0 3or, where a is the smaller standard deviation; and assuming the 
distribution of the difference of means to he* normal, find the probability that it 
exceeds half that amount. 

20.12. A random sample of 1000 farms in a certain year gives an average 
yield of wheat of 2000 His. per acre, with a standard deviation of 192 lbs. A 
random sample of 1000 farms in the following year gives an average yield of 
2100 lbs. per acre, with a standard deviation of 221 lbs. Show that these data 
are consistent with the hypothesis that the average yields in the country as a 
whole were the same in the two years. 

Would you modify this conclusion if the farms in the second sample were the 
same as those in the first? 



SAMPLING OF VARIABLES—LARGE SAMPLES, 393 

20.13. Find the mean and median of the U-shaped distribution of Table 6.14, 
page 106, and compare their standard errors. (For the purpose of this exercise 
the median frequency may be found by simple interpolation, but this gives a 
value on the high side.) 

20.14. The mean of a certain normal distribution is equal to the standard 
error of the mean of samples of J00 from that distribution. Find the probability 
that the mean of a sample of 25 from the distribution will be negative. 

20.15. If it costs a shilling to draw one member of a sample, how much would 
it cost, in sampling from a universe with mean 100 and standard deviation 10, 
to take suflieient members to ensure that the mean of the sample in all prob¬ 
ability would be within 0 01 per cent, of the true value? Find the extra cost 
necessary to double the precision. 

20.16. Consider the data of Table 6.7, page Of, giving the distribution of men 
by height in each of the four countries which then formed part of the United 
Kingdom. The means and standard deviations of the four distributions are 
given in Exercise 7.1, page 131, and Exercise 8.1, page 152. 

What is the standard error of the mean of a sample which consists of 400 
mem, 100 chosen at random from each of the four countries? 



CHAPTER 21. 


THE SAMPLING OF VARIABLES LARGE SAMPLES, 
CONTINUED. 

The Problem. 

21.1. We have just considered the standard errors of the most 
important measures of location, the median and the mean, and of certain 
measures of dispersion, the percentiles and the semi-interquartile range. 
We now proceed to discuss the standard errors of other important para¬ 
meters, including the standard deviation, moments and correlation 
coefficients. All that we have said in regard to sampling distributions 
generally in 20.1 to 20.22 applies equally well to this chapter; and we 
shall throughout the following sections be thinking ot simple sampling 
unless we state explicitly to the contrary. 

Standard Errors of Moments . 1 

21.2. The data from which we calculate the moments are arranged 
into a certain number of groups. Suppose there arc m such groups, and 
that the expected frequencies falling into them are // 1? // 2 , . . . ;/ m , where 
?/ 1 +t/ 2 + . . . +y m -&(y) -w, n being the number m the sample. The 
expected frequencies are, by definition, proportional to the frequencies in 
the various groups in a very large sample; and these, if the sampling is 
unbiased, are proportional to the frequencies in the various groups of tin* 
parent universe. 

Let us in the first place recapitulate some of our earlier work by finding 
the standard error of one of the frequencies, say ?/ H , due to llucluations of 
sampling. 

The probability that an individual chosen from the universe falls into 
the sth group is The probability that it does not is 1 For n 

individuals the distribution of frequencies is given by the binomial 



with an expected value t/ s and a standard deviation 



Now, if the sample is large, we can take the observed frequency in the 
,sth group in calculating the standard error of the frequency of that group. 

1 The student whose main interest lies in the practical application of the results of 
this chapter may prefer to omit paragraphs 21.2 to 21.8. 

394 



SAMPLING OF VARIABLES—LARGE SAMPLES, CONTD. 895 . 


Taking this observed frequency as our estimate of y s , its standard error, 
given by 


a 


t 

y 8 



. ( 21 . 1 ) 


This, in another form, is our familiar result for the sampling of 
attributes. 

21.3. We may now find the correlation between errors in y 8 and errors 
in another group-frequency, say y { . It is evident that such a correlation 
will exist, for if #/, falls below its expected value, some other frequencies 
must be increased. 

We shall write a deviation of y H as 8//,. (The symbol 8 is not to be 
regarded as a number multiplying y sy but is to be read together with y H so 
that 8y s is a single symbol representing a single quantity.) 

Since 

S(*/)=*/l+'/2+ • • • 

S(S?/)- 8;/! * %> + ... + 8y m — 0 

for the sum of deviations from the expected values must be zero. 

We may now assume that, on the average, a deficiency 8?/ s in z/, will be 
spread over the remaining groups in proportion to the expected frequencies 
in those groups, i.e. that 

A, 

Renee, 

SyM'- ~ (8?A) 2 ,/'; . • • (21.2) 

Now let us sum both sides of this equation for all values of the deviations 
8//, and $y t . By definition we shall get 


a 


»Vy t r v s v t 


2 lh _ 

(J y 

* n - y s 


where r VsVt is the coefficient of correlation between 8 y h and 8 y t . 
lienee, in virtue of (21.1), 




•'s u 


Us!Jt 

n 


( 21 . 8 ) 


This is a more general ease of the correlation between percentiles, which 
we considered in 20.27. 

Standard Error of the gth Moment about a Fixed Point. 

21.4. By definition, the q\h moment about an arbitrary point is j u Q \ 
where 

n(jL q ' = S(x s Q y a ) 


x being the variate measured from the arbitrary point. 

Hence, writing as before, Sp q for the deviation in p q due to deviations 
Sz/ 8 , we have: 

ftS/V =S (£*%,) 



396 


THEORY OF STATISTICS. 


Squaring both sides, 

n 2 (8p,/) 2 - (< 0 %! -* arfhyt + . . . +*V%J 2 
- S{a\ 2(? (8;/ s ) 2 } F2S'(a\ tf £r^8//,8iy < ) 

where S' denotes summation o\cr all values of .9 and t except those for 
which s ~t. 

This equation holds for any one sample, and we have to sum it for all 
samples. Carrying out this summation first (in which s and t are fixed), 
and substituting from equations (21.1) and (21.3) on the rigid-hand side, 
we have : 




Hence, 


S(<r, 2 %) - t pS(,r s *?/,)S(tr,<ty t ) 


~Ufl 2q 


IP*-Hq 


Example 21 A .—Let us find the standard error of the first moment, 
or mean h. 

We have, from (21.4): 


y n 


Ui - h 2 


Now p 2 - h 2 is the second moment ;x 2 about the mean, i.e. is cr 2 . 
Hence, 


^ Vh = \ 


n vn 


which is the result we have already found in 20.30. 

Correlation between Errors in the qth and rth Moments, both 
about the Same Fixed Point. 

21.5. As m 21.4 we have: 

«8/V S (jr s "Sy s ) 

w8/x,' - S(x/8y s ) 

Multiplying, 

w%i/S/i/=S(«, a+r S»/,, 2 ) t S'{(j' s <? a’, r +a’/a’i ! ')(8?/ t 8(/,)} 
and summing for all samples, 

^ s (3\"'’0 + S'{(a' 4 a ®» r + x, r x t <‘)(a v a Vt r Vin )} 



SAMPLING OF VARIABLES—LARGE SAMPLES, CONTD. 897 


On substitution for cr* s and a v ir Vt r y$yt from (21.1) and (21.8), the right- 
hand side reduces to np.' Q [r - and hence, 




l*q + r ~ H'qH'r 


(21.5) 


Standard Errors of the Moments about the Mean. 

21.6. In 21.4 and 21.5 we have considered moments about a fixed 
point. In practice we have to deal more usually with moments about 
the mean of the sample. Since this mean is itself subject to sampling 
fluctuations, the standard errors of moments about the mean will not in 
general be the same as those about a fixed point. 

If h is the mean we have, by definition, 

w/z„=S{(a?, -A) 9 ?/,) 

- - qhS(wJ- hh) + T 


whore T is written generally for an expression involving h 2 and higher 
powers of h. 

Now let h vary to h + 6h, i) s vary to y 9 + $y„ and /x (/ > ary to p Q -I 8/z tf . 
We have: 


«(/a„ I <V<,) S{a\«(»/ s i 3*/ s )J -q{h i 8y,)) + T 

Subtracting the equation for np a , 

"%/- s (‘ l V% s ) 4 U 

= oS//' ihjh' ,8A - nqSh8fi,'_ l + (I 

where TJ will involve h and higher powers. We may neglect the term in 
bhhp,' i as being small compared with the remaining terms. Squaring 
and summing for all samples, 

°l q - <vV f 1 V - %4-i^v r/ v h u 

Substituting for a etc., from (21.4) and (21.5), 

a _ / 4 q - /V ^ "V/viMj+i h (J 

^ n 


Now ]>ut h 0. V vanishes and the moments become moments about 
the mean and may therefore be written without dashes. Hence, 


/ 

,-V 


l l 27 


/V+9W-1 
n 


i/*<h 


. ( 21 . 6 ) 


Correlation between Two Moments Both Measured about the 
Mean. 

21.7. In a similar way it may be showu that 

_ r _t i Q+ry(A Q PT + qrp2p Q „ 1 tl r „ l --rp, Q+l p,r-'i-<]P(i~iP'rn 
<Jn q <Jn/vqHr~~ " w 

We omit the algebra for the sake of brevity. 


(21.7) 



398 THEORY OF STATISTICS. 

Correlation between Errors in a Moment about a Fixed Point and 
in a Moment about the Mean. 

21 . 8 . Let us first of all find the correlation between deviations in a 
group*frequeney y t and the moment about a fixed point. We have: 

n/i,j «S (or 8 q y s ) 

Hence, 

n$fi ( /Sy { ^ Sy ( S(<r H a &y,) 

the summation S' being taken over all values of a except a -1 . 

Hence, summing for all samples, 



UM't 1 ~ /VJ 


Hence, 

°V,/ cr V#i/^ n fat 1 ~ ) (21.8) 

Similarly, for the product-sum of deviations m y, and the moment y Q 
about the mean, we Inn e: 

-/*«') -*K-I 

+ terms in h and higher powers 
Putting k —0, the right-hand side reduces to 

i) • • • • 

For the produet-sum of errors in \i q and \x n 

nS/^'-S(cr f> % g ) 

Sfi r =■ S fi/ ~ rShjj,' 4* U 

where V , as before, denotes an expression involving k and higher powers. 
Hence, 

-Sfa s «8y b $fji r ') - S (u'^Sy.rShfji'^) + U 
Summing for all deviations, 

Sfa^a^r,, s »/) » U 

and substituting from (21.8) and (21.9) the right-hand side becomes 

fAq+r ~~ y q Pr __ r H'd-\ ljMr-l jj 
n n 

Put/i~0. Then, 

n r . _ M</rr ~ “ r l l q n/V-l 

a »<l U *r r *q» r ~ “ n 


( 21 . 10 ) 



SAMPLING OF VARIABLES—LARGE SAMPLES, CONTD. 399 

Use of Sheppard’s Corrections in Evaluating Standard Errors. 

21.9. Theoretically, Sheppard’s corrections for grouping arc not to be 
used in evaluating the moments which enter into the general equations for 
standard errors obtained in the previous sections. For, as the corrected 
values differ from the uncorrected values only by constants depending on 
the width of the interval, the sampling deviations of corrected and un¬ 
corrected moments arc equal, anil hence so are their standard errors. But 
the standard errors of imcorrectcd moments arc given by the equations we 
have obtained in the foregoing section, and hence those equations arc 
applicable to corrected moments provided that the imcorrectcd values arc 
used in them. 

In practice, however, it seems to make very little difference which 
moments wc use, unless the sample is very large indeed. But as the 
uncorrected values have to be obtained before the corrected values can be 
calculated, and are therefore usually available, it is as well to use the 
uncorreeted values wherever possible. 


Standard Error of the Variance. 

21.10. Armed with the general results of the foregoing sections, the 
methods of winch are due to Karl Pearson (ref. (100)), we can discuss the 
standard errors of a large class of parameters. 

From equation (21.0), putting q- 2, we have, since = 0, 





( 21 . 11 ) 


which gives the standard error of the variance /x 2 . 
If the parent universe is normal, 


and hence, 


M 2 

^2" 


th =»cr' t 



(10.23) 

/jj 



( 21 . 12 ) 


Standard Error of the Standard Deviation. 

21.11. If M 2 ls 1 h c variance, w e ha \ e: 

H 1 = a 2 

lienee, 


/x 2 4 &M 2 ~ (o' f Scr) 2 

= cr 2 4- 2aSa 4- (8a) 


2 


Neglecting 8a 2 in comparison with 8cr, 

Smj ^EcrScr 


Squaring and summing for all samples, 

~ 4 < 7 2 < 7, 2 

2 



400 


THEORY OF STATISTICS. 


Hence, 


<T ( r 


1 


V 4/i,n 


If the parent distribution is normal this reduces to 


(21.18) 


Vo ~ 


a 

V 2 n 


(2i.n) 


21 . 12 . The form of equation (21.14) has been widely used for the 
standard error of a without due regard to the nature of the parent universe, 
and the student should guard against this mistake. 

We have, in fact, from (21.13): 


vVa 


1 / /*4 

\ . 


\ 

V'2 n 2 ' t l 


3 J 


How far o a can be taken to be the value (21.1 1) therefore depends on 
how close the factor (l +- 2 ^ is to unity, i.e. depends on the kurtosis 


of the parent distribution. 

The following table shows the value of this factor for various values 
of ft 


ft 


<"V)‘ 


3 


6 

7 

8 
9 


0 7071 
1 0000 
1-2247 
l 4142 
1 5811 
1-7321 

1- 8708 

2 - 0000 


It thus appears that if the universe is leptokurtie the real standard 
error is greater than that given by the assumption of normality, and may 
be twice as great or even more. If the universe is platykurtic the real 
standard error is less than the “ normal ” value. 

If * is small, the factor (1 4- - 2 ^ ) is approximately 1 f 

This differs from unity by more than 5 per cent, if ft is less than 2-8 or 
more than 3-2. Hence, values of ft lying outside the range 2*8 to 3*2 (and 
they are more common than not in practice) will giv e an error of more than 
5 per cent, if the universe is assumed to be normal. 

Example 21.2 .—For the height distribution of Table 6.7, page 91, we 
have found that a =2-57 inches, n^8585. The universe may be taken to 



SAMPLING OF VARIABLES—LARGE SAMPLES, CONTE. 4-01 

be normal, for /? 2 from the sample is 3149 (Example 9.9, page 105) and 

2*57 

hence the standard error of cr-=- , =^_=0*02 approximately. 

V2 x 8585 

Hence, we may say that the s.d. in the universe almost certainly lies 
in the range 2*57 + 0*00, assuming that the sampling is simple. 

Example 21.3 .—The distribution of Australian marriages of Table 0.8, 
page 96, has uncorrected moments p 2 and p v in class-intervals, as follows: 

= 7*0570 

p. 4 - 408*7382 (Example 9.2, page 159.) 

lienee, 

a - V'fi, ‘2 0505 

The standard error of a — \ 

’ 1 ji.ii 

108-7382 - (7-0570) 2 
~ ^ 4 x 7-0570 x 301,785 
0*00619 class-intervals 

As \nc should expect from such a large sample, the standard error is 
very small, and \u* conclude that the standard deviation of the parent 
lies in the range 2*6565.1- 0*0195. 

It may be pointed out that if we take these' data as a sample' of 
Australian marriages in general, we may be violating the conditions of 
simple sampling, for the distribution most likely changes from year to 
year. 

Example 21.4, - In the previous example we worked throughout with 
uneorrectcd values. The corrected moments (Example 9.1, page 160) arc : 

p 2 0*9736 

p 4 = 405*2389 

We then have, for the corrected value of cr, 

a - V 6-9733 

•jo n 

But the standard error of a is 0*00649 as in the previous example, for we 
must use the uneorrectcd values in calculating it. 

As a matter of fact, if we had used the corrected values we should 
have found the value 0*00651—a practically negligible difference even for a 
sample of this size. 

Finally, let us compare this value with that given by the assumption 
of normality. We have: 

cr J2-6565 

V'2v V 603,570 

= 0*00342 class-intervals 


26 



402 


THEORY OE STATISTICS. 


i.e, only about half the true value. This is in accordance with the table 
of page 400, for /3 a is over 8. 


Comparative Effects of Sampling Fluctuations and Corrections 
for Grouping. 

21.13. Writing temporarily o^ 2 for the uncorrected value of the 
variance and ct 2 2 for the corrected value, we have: 


or 


O O 

or % - 


h 2 
12 


1 - 


i 

12 < 7 t 2 


If the class-interval is chosen so as to make the number of interv als r/, 

then tier, would be about dh and 1 about lienee, 

a x d 


3 

or, since 

d z 


is small, 



1 


3 

d* 



8 

2 c/ 2 


For instance, if d is 20, the corrected value is about 0*375 per cent, less 
than the uncorrected value. 

Now, for a normal universe, 

a 

V2n 


and if n is, say, 1000, the standard error is - ° - -0-0224 a ^2*24 per cent. 

44*72 

of cr. Thus Sheppard’s correct ion amounts to no more than about one- 
sixth of the standard error, and to make it gives an almost misleading 
idea of precision in most practical eases. 

It was lor this reason that we recommended (8.11 and 11.29) that the 
Sheppard corrections should not be applied if the total frequency is less 
than 1000. On the other hand, in Examples 21.8 and 21.4 the correction 
is large compared with the standard error and can reasonably be made, 
owing to the largeness of the sample. 


Comparison of Standard Deviations of Two Samples. 

21.14. As in 20.32, where we considered the comparison of the means 
of two samples, if the samples arc independent and come from the same 
universe the standard error of the difference of their standard deviations 
is given by 


: 2 M 

12 4/x a l«! wj 


. (21.15) 



SAMPLING OF VARIABLES—LARGE SAMPLES, CONTD. 408 


where n lt n 2 are the numbers in the samples, or, if the universe be normal, 


€ 


2 

12 


2 \n 1 n 2 f 


( 21 . 16 ) 


If the two samples are drawn from different universes with constants 
/x 2 , and r 2 , e 4 , the standard error of the difference of the standard 
deviations is given by 


6 2 [ « / 4 ~ y « 2 

4 ^ 2 ^! 4 -^ 2 ^ 2 


(21.17) 


or 


e 


2 

12 


rr 2 rr 2 

2/q 2^/ 2 


(21.18) 


if the universe be normal. 

Again, if the standard deviation of one sample is compared with the 
standard deviation of the two samples when pooled, the standard error of 
the difference is, if the distribution be normal. 


a* n 2 

2 ti^n , 4 n») ’ 


. (21.19) 


These results can be used to test the significance of differences between 
standard deviations precisely as the equations of 20.32 and 20.33 were 
used to test the significance of differences between means. 


Standard Error of Third and Fourth Moments about the Mean. 

21.15. F rom equation (21.6), putting r/ -=3, 

IP o - pZ - ( >P\l‘-i ■* - } PZ 


o.-Z* 


If the distribution is normal, 

/x 6 ~ 15a 6 , /x 4 - HgK p 2 ~<y 2 


Hence, 


/- V 15-184-0 (J n \ 


/o 

■1 \ 


* X n ' n 

Similarly, from equation (21.6), putting q — 1 , 


‘V.-V’ 


Ms ~ /x 4 2 - 8/ x-/t 3 4-10^2/^3 2 
n 


If the distribution is normal, -105cr 8 , u 5 =0. 
Hence, 

rr 4 - - 

<j M - . V 105 - 9 
4 Vn 


— 


96 

n 


( 21 . 20 ) 


( 21 . 21 ) 


( 21 . 22 ) 


. (21.28) 



404 


THEORY OF STATISTICS. 


JB 


Example 21.5 .—For the height distribution of Tabic 0.7 we have 
(Example 9.1, page 156): 

/x 2 (uncorrected) = 6-6168 

p, 3 (uncorrected)--- -0*2078 
/x 4 (uneorrected) -137*0892 


and from Example 9.3, page 100: 

(corrected) - 0*5335 

/a, (corrected) -0*2078 
p 4 (corrected) _ 131* 1100 

We did not calculate higher moments, and hence cannot use equations 
(21.20) and (21.22) with these data. The distribution is, howe\cr, 
approximately normal. Hence, from (21.21), 

<^ 3 ^ 3 V 85 g 5 --°' 45 approximately 


The value of p 4 cannot therefore be judged significant I v different front 
zero, which is what we should expect, for we have assumed the universe to 
be normal. 

From (21.23) we have : 


4 J 96 
"'V- '8585 


=-1*63 approximately 


These are calculated from ttie uncorreeted value of a. We may infer 
that p 4 (corrected) lies within the range 131*11 ^ 13*89. The Sheppard 
correction is only 3*28, and is submerged in the possible sampling deviation, 
even for a sample of 8585. What we have said in 21.13 applies, in fact, 
a fortiori to the higher moments. 

21 . 16 . It will be evident that the standard errors of moments of high 
order are very large; for the moments increase rapidly, and the standard 
error of the moment of order q depends on the moment of order 2 q. For 
example, in the normal distribution, for q -6, jx 2<J - 10,395a 12 and a,, will 
100 a 6 

be of the order - , , whereas -15a 6 . Unless, therefore, n is at least 

Vn 

400, the range 3a M will be greater than the value of p, 6 , and hence we 
cannot locate the value of in the universe with any exactness. Our 
approximations, in fact, break down if the deviations are large. 

The large sampling errors of moments of high orders prevent the use 
of moments higher than the fourth in most practical problems. 


Correlation between Errors in Mean and Standard Deviation. 


21.17. From equation (21.10), putting g-1, r- 2 , and remembering 
that pi ** 0, we have: 


a 



* n 



SAMPLING OF VARIABLES—LARGE SAMPLES* CONTI). 405 

Hence, if /x 3 ~0, errors in the mean and variance, and hence in the 
mean and s.d., are uncorrelated. In particular, we have the important result 
that errors in the mean and s.d. in a normal universe are uncorrelated. 

Standard Error of the Coefficient of Variation. 

21.18. The coefficient of variation V is defined as 


Hence, 


TT 100(7 

r 

_ 100 vV 2 


F)M . JooyV. + S/u 
It l -bh 


100V 
h 


Ti + 8 Th t . 8 !') 1 

' IH >1 ' 


r j 1+ Wf. 8/11 
1 i I+ 2 ^n 1 ~ h> 

Neglecting cpiantitics small compared with Sp 2 and hit, this becomes 


l J, 8/ 'l 

t I 1 ~ 7 i 


Hence, 




hi 


8 V 8jLt 2 _ S/i 

l jJLsy It 

(SIT (S M2 ) 2 ( (S/0 2 i 


2 + / > “ / S fu8h 

1 - I ft./ It - ^ 


Summing for all samples we ha\e : 

o i/ (r ,~, rr /t 2 1 

r 2 “ i /t2 - + /,= 

If the distribution is normal: 


and r^ jt -0 (21.17). 
Hence, 


Hence, 




<V 1 C7 2 

‘ V 2 " 2m ' hbi 

1 i 1+ 2 H 

2w| I0< 


a, "Va»V 


r / ^ r- 

1 


. (21.21.) 



406 


THEORY OF STATISTICS. 


In many practical cases the second term differs little from unity and 
will give a sufficiently precise result. 


V 


V2 n 

Standard Error of /?, and /? 2 . 

21.19. The standard errors of j8 x and can be deduced in a similar 


manner. 

In fact, 


o _Ma 
Pi “"'3 
M2 

o , so (Ms + $fa ) 2 
which, after some reduction, gives 

fa M 2 

Squaring and summing for all samples: 

2 4/x 3 2 o 9u.j 4 2 I2/Zo 8 

" “ ' 3 or + cM - “ : a,, ex u, r u 


M 2 

>Mo 2 


M2 


M2 


== ~ 6 (Mo ~ fa 2 “ ty^Ms + ^2 3 ) 
1 M2 


9/x 4 ox 12/x, 3 

+ , ( 8 (fa~ fan- „ 7 (Mr,' V 2 M 3 ) 

M 2 M‘2 

In terms of /3 1? /3 2 , and /$ 4 (see p. 10], footnote, lor definition of the 
higher jS’s), 

< = ^{^- 24 ^ + 86 + 9 ^- 12 ^+ 35 ^] . ( 21 . 25 ) 

Similarly, 

<=-*{£« - 4ftft + 4ft* - ft* + lfiftft - 8ft H 16ft} . (21.26) 


The labour of evaluating these quantities may be obviated by the use 
of tables given in u Tables far Statisticians and Biometriciam , Part 1 .” 

21.20. There is here one important point to be noted. In equation 
(21.24), if F = 0, <jj ~ 0. Similarly, in equation (21.25), if < 7 ^= 0 . 

It might be thought from this that if in a large sample we find in the one 
case that F = () (and hence that a- 0), or in the other ease that the distri¬ 
bution is symmetrical, then V — 0 or jSj —0 in the universe. This is not 
necessarily true. 

V will vanish only if all members of the sample give the same value 
of the variate. If the sample is large, it will be evident that if there is 
any variation in the parent it must be small; but it is not impossible 
that members should exist showing deviations from the observed value. 
The explanation is to be found in the terms which we have neglected 
in our approximations. These, though in general small compared with 
the terms retained, may be important if the terms retained themselves 




SAMPLING OF VARIABLES—LARGE SAMPLES, CONTD* 407 

vanish. Furthermore, our assumption that the sample value may be 
assumed to be the parent value may be unjustified if both are very small 
compared with their difference. Equations such as (21.24) and (21.25) 
must, therefore, be treated carefully in the neighbourhood of values which 
cause them to vanish. 

21 . 21 . From the foregoing work the student will have no difficulty 
in accepting the statement that it is possible to calculate the standard 
error of any quantity which is expressible as a function of the moments. 
Such a standard error would, however, be applicable only to a value 
which had actually been calculated from the moments, and not arrived 
at by some other means. Wc shall not pursue the subject further in this 
book, but we may point out that the standard errors of certain quantities, 
such as an approximation to the Pearson measure of skewness ( 9 . 12 ), have 
been tabulated in “ Tables for Statisticians and Biometricians ” for different 
values of and /3 2 . The same tables also contain some results of interest 
in connection with the sampling distributions of range. 

We now turn to the parameters of multivariate universes, the correla¬ 
tion coefficients, regression coefficients, and some of the measures of 
association. 


Standard Error of the Correlation Coefficient. 

21.22. For samples from a normal universe the standard error of 
the correlation coefficient is given by 


cr, 


1 

\/n 


(21.27) 


A proof of this result would take us be)mid the scope of the present 
work. The student who is acquainted with the differential and integral 
calculus may refer to ref. (459). 

The formula applies also to partial correlations. 

21.23. Formula (21.27) is sometimes used to estimate the precision 
of correlation coefficients obtained by the use of the product-moment 
formula without reference to the nature of the uni\erse. This practice 
is hardly to be commended, although sometimes there is nothing better 
to do. It is, however, possible to generalise the procedure of sections 
21.2 to 21.8 to the bivariate case, and it may be shown that 

, * n 4. (21.28) 

r n Wii 4 Pao 4 Poa 2 PaoPoa P11P20 PnPoa 1 

(For the definition of the bivariate moments, see footnote, p. 214.) 

In addition, if the regression is linear, denoting the j8 a ’s of the two 
variates considered separately by /? 2 , /V, 

(\ ~r*) 2 ( r 2 ) 

° r * n l 1- 4(l^ (ft " 3 + ft ' _3 )f • (21 ’ 29) 

which reduces to (21.27) if the kurtosis is zero. 

If the distribution is not normal and r is not small, the difference between 
the values given by (21.27) and (21.29) may be considerable ; but it may 
be noticed that the value given by (21.27) is less than that given by (21.29) 



408 


THEOKY OF STATISTICS. 


if the distribution is platykurtic for both variates, and greater if the 
distribution is leptokurtic for both variates. 

21.24. In particular, it may be shown that for a 2x2 table in which 
the frequencies are (AB), (A/3), (aB) and (a/3), the standard error of the 
correlation coefficient calculated by the product-moment method on the 
assumption that the frequencies are concentrated at points is given by 


a. 


Ai- 


r 2 + (r + Jr 3 ) 




U)(a) (1))(I3) ) 


(21.30) 


21.25. The standard error of tetraehorie r, as calculated in the 
manner of 13.23, is given by very complicated expressions which we do 
not reproduce. The student may be referred to ref. (Mi5) for an approxi¬ 
mate form and certain tables to facilitate the arithmetic. 


Example 21.6. -In the data of Table 11.3, page 109, we found that 
the correlation between tin* stature of the father and the stature of the 
son was 0-51. Regarding these data as a sample of 1078 from the universe 
of fathers and sons, we have: 

, , 1 ->* 2 1 - (0-51 ) a 

Standard error of r - . - . 

Vn \ 1078 

- 0*023 approximately 


lienee, if the sampling was simple, the correlation in the universe 
most probably lie's within 0*14 and 0 58. It is thus undoubtedly real. 

Example 21.7.-- In considering data from 11, UG cows, J. F. Tocher 
found a negative correlation of 0*0790 between yield of milk per week and 
percentage of butter fat. Is this significant, i.r. could it have arisen from 
an uncorrelated universe by sampling fluctuations? 

If r-0, 

1 1 

Vn VThilO 
= 0*008 


The correlation observed is ten times this, and small though it is, 
could not have arisen from sampling fluctuations. 

In this example we may reiterate the caution to be observed in 
inferring from the sample anything about the universe (eow r s in the 
United Kingdom) as a whole. The records were, in fact, taken by the 
Scottish Milk Records Association from constituent associations at various 
years between 1908 and 1923. The conditions of simple sampling may, 
therefore, have been violated both m regard to time and in regard to 
place. 

Standard Error of the Coefficient of Regression. 

21.26. The standard error of the coefficient of regression from a 
normal universe is given by 



SAMPLING OF VARIABLES—LARGE SAMPLES, CONTD. 

a „ ^ ^ ^ r ?3» °”l 2 

12 a 2 V n a 2 Vrt 


409 
(21 .SI) 


This again applies to a regression coefficient of any order, total or 
partial, i.e. in terms of our general notation, k denoting any collection of 
secondary subscripts other than 1 or 2, 

Standard error of b 12 k 1 2 /,__ 

for a normal distribution/ 

The Correlation Ratio and Coefficient of Multiple Correlation. 

21.27. It has been shown that the sampling distributions of the 
correlation ratio and the multiple correlation coefficient from normal 
universes do not tend to the normal form for large samples, although they 
do give single-humped distributions. The use of a standard error in such 
cases must be made with groat caution, and it is probably better to apply 
one of the tests of sigmlieanee which we shall consider later in connection 
with the theory of small samples. The formula usually given for the 
standard error of the correlation ratio is an approximate one: 


a v ~ 


> 

V 

\n 


(21.82) 


21.28. Somewhat similar remarks apply to the coefficient ^ = 7j 2 -r 2 
which, as we saw in 13.8, may be used to test the linearity of regression. 
The use of a standard error for £ in an attempt to gauge the significance of 
a departure from linearity lias been subjected to very damaging criticism 
by R. A. Fisher. 

Example 21.8 .—C onsider the data of Example 14.2, page 272 (relation 
between pauperism, age of population and number of population). 

We found: 

x l —0*325^2 t l’RSJLr, - <)*383*r 4 

Taking this to be given by a random sample from a normal universe, is 
the value 0*825 significant ? 

We have: 

<712**1 ^liM^ ; 2I M 

* 32 14 ff 2 . 34 \ n a.mVn 

_22*8\/l -0*t57 2 
32*1 V 7 32 
-Oil 

The coefficient b l2 34 is therefore signifu*aut. 

In this example the number in tiic sample is not as large as one might 
wish and the standard error is probably underestimated ; but if any 
doubt exists it is possible to make more definite tests by the methods of 
Chapter 23. 



410 THEORY OF STATISTICS. 


Standard Error of Coefficient of Association. 


21.29. We may refer briefly to the quantities treated in Chapters 8, 
4 and 5 in considering the association of attributes. 

The coefficient of association, Q, defined in 3.15, has a standard error 
given by 


1 - Q 2 /I 1_ 

2 V (AB) + (Afi) + (aB) + (ap) ‘ 


( 21 . 88 ) 


This quantity is not infinite, as might at first sight appear, if one of 
the cell frequencies vanishes, because in that case I - Q 2 also vanishes; in 
fact, in such an event cr Q =* 0. 


Standard Error of the Coefficient of Mean Square Contingency. 

21.30. The determination of the standard error of the coefficient of 
mean square contingency is a matter of considerable mathematical com¬ 
plexity, and even when approximations are employed, leads to expressions 
which are tedious to calculate in practice. For a detailed discussion we 
must refer the student to the original memoirs (refs. (448) and (489)). 


The Rank Correlation Coefficient. 


21.31. Unlike most of the parameters we have been considering, 
the distribution of the rank correlation coefficient is discontinuous, and 
to that extent resembles the binomial. Very little is known about the 
distribution except in the important case when the correlation in the 
universe is zero. The other cases are sometimes treated by assuming a 
normal continuous distribution in the parent and working from ranks to 
grades and thence to the product-moment coefficient of correlation by 
the equations (13.11) and (13.12) of 13.21 ; but this procedure is hardly 
to be recommended. 

The case when the correlation in the universe is zero, i.e. when all 
possible permutations of the ranks occur with equal frequency, has to some 
extent been investigated. It was shown by “Student” in 1907 that the 
standard deviation of the rank correlation coefficient is given by the simple 
equation 

a, 1 ... . (21.34) 

V n - 1 


This cannot be taken to be a standard error in the ordinary way, 
because the distribution is not normal for small samples. But it has been 
shown by Hotelling and Pabst (ref. (540)) that for large samples the 
distribution may be taken to be continuous and normal, whether the 
universe can be regarded as classified according to a continuous variate or 
not. The appearance of the normal curve in this connection is peculiar 
and unexpected, for the distribution in small samples might lead one to 
expect a bimodal distribution. 

21.32. It has been shown 1 that for low values of n the normal distribu¬ 
tion gives an unsatisfactory approximation, but that for values of n greater 
than 8 the significance of an observed p can be tested in the ^-distribution 
(see below, 23.15) by entering the tables with t~pV(n - 2 )/V '1 ~p 2 ) and 
v = n-2. For values of n up to and including 8 the exact distributions 
are given in the paper under reference. 

1 Kendall and others, “ The Distribution of Spearman’s Coefficient of Rank Correia- 



SAMPLING OF VARIABLES—LARGE SAMPLES, CONTD. 411 



4. From the results of (2) and (3), and similar results for moments 
about a fixed point, it is possible to calculate the standard error of any 
function of the moments. 

5. In the normal universe, errors in the mean and standard deviation 
are uncorrelated. 

6. In calculating the standard errors of moments the uncorrected 
values should be used. 

7. It is unsafe to use the formula* for standard errors appropriate to the 
normal universe in eases where the universe is suspected to differ from the 
normal form ; in particular, the formula for the standard error of the 

standard deviation, should not be used for parent universes which are 
V2 n 

markedly lepto- or platy-kurtic. 



412 


THEORY OF STATISTICS. 


EXERCISES. 

21.1. In the weight distribution of Exercise 0.6, page 111, last column, find 
the standard error of the standard deviation. Compare it with the value 
obtained on the assumption that the parent distribution is normal. 

21.2. In the same data, compare the ratio of the s.e. of the s.d. to the s.d. 
with the ratio of the s.e. of the semi-interquartile range to the semi-interquartile 
range. 

21.8. Show that for a normal universe the standard error of the s.d. is less 
than the standard error of the semi-interquartile range. 

21.4. In a sample of 1000 the mean is found to be 17-5 and the standard 
deviation 2-5. In another sample of 800 the mean is 18 and the standard 
deviation 2-7. Assuming that the samples are independent, discuss whether 
the two samples can have come from universes which have the same standard 
deviation. 

21.5. Find the correlation between errors in the mean and standard deviation 
for the height distribution of 8585 men of Table 0.7. page 04, and do the same 
for the marriage distribution of Table 6.8, page 06. 

21.6. Find the standard errors of the first four seminvariants as calculated 
from the moments. 

21.7. Samples of 10,000 are taken from a normal universe. For what even 
moments does the standard error of the moment lie within 10 per cent, of the 
value of that moment? 

21.8. For samples of (a) 100, ( b ) 1000, draw a graph showing how the 
standard error of the correlation coefficient from a normal universe varies with r. 

21.9. (Data quoted by M. F. Hoadley, “Note on the Association of Relative 
Laterality of Hand and Eye from the Cambridge Anthropometric Data,” 
Biometrika , vol. 20R, 1928, p. 401.) 

Three experiments were conducted to determine the relationship between 
laterality of hand and laterality of eye. The correlations between (1) difference 
of strength of grip and (2) difference in visual acuity were: 

- 002410 (8234 subjects) 

-000738 (4003 subjects) 

+ 0 02962 (1447 subjects) 

Find the standard errors of the three correlation coefficients, and hence show 
that it cannot be concluded that there is any significant correlation between 
laterality of hand and laterality of eye. 

21.10. Find the standard errors of the partial correlation coefficients of 
Example 11.1, page 270. Hence state whether any one is not significantly 
different from zero, and if so, which. For the purpose of this exercise normality 
may be assiuned, although in all probability the actual data do not emanate 
from a normal universe. 



CHAPTER 22. 


THE x 2 DISTRIBUTION. 

22.1. In Chapters 19 to 21 we have seen that a knowledge of the 
sampling distribution of a parameter gives us a means of judging from 
samples the relationship between faet and theory. For instance, m 
Example 19.3, page 352, we were able to infer from a knowledge of the 
binomial distribution that the dice wlneh provided the data were probably 
biased ; and in Example 20.6, page 386, we could apply a knowledge of 
the distribution of the mean of samples from a normal population to reject 
the hypothesis that the mean in the universe was less than 67 inches. 

In the present chapter we shall discuss a particular sampling distribu¬ 
tion of profound importance m statistical theory, and shall note its 
applications to the testing of accordance between faet and hypothesis in 
a wide range of cases. 

Cells. 

22.2. In what follows we shall consider only data giving the fre¬ 
quencies of individuals falling within various categories. Statistical data, 
as will have been evident from the examples already given in this book, 
are very often of this type. 

Such data, whether relating to attributes or to continuous variates 
or to a mixture of both, will in practice be arranged in compartments. 
For example, in the association table on page to there are four com¬ 
partments, corresponding to the four ultimate classes. In the table of 
frequencies within various height ranges (Table 6.7, p. 9t<), each range 
determines a compartment, and the data consist of 8585 individuals 
distributed in 21 groups. 

It is convenient to have a name for those compartments. We shall 
call them cells. The frequency falling in a cell will be referred to as the 

cell frequency. 

One and the same table may contain frequencies of more than one 
order, and frequencies of different orders must be kept distinct. Thus 
an association table has four cells with frequencies of the second order 
and two sets of two (the border frequencies) of the first order. A p xq 
contingency table has pq cells of the second order (to condense our ter¬ 
minology) and a set of p and a set of q of the first order. Each such set 
must be considered by itself. The tests of this chapter are applicable 
to any homogeneous set, but not to a “ mixed ” set comprising cells of 
different orders. 

22.3. We shall denote the number of cells in the presentation of a 
set of data by n, and the cell frequency occurring in the rth cell by m r . 
Thus, in the table of page 94 we have, numbering the cells downwards: 

413 



414 


THEOKY OF STATISTICS* 


m x = 2 
m 2 ~4 
= 14 

™$i- 52 

22.4. In the class of eases wc shall consider, we wish to compare 
the actual values m with the cell frequencies which would exist if a 
particular hypothesis II were exactly verified. These latter values we 
shall denote by the letter w, so that the theoretical frequency in the rth 
cell is m r . 

The cell frequencies m r are sometimes referred to as the “ expected ” 
values on the hypothesis II. This is rather a special use of the word 
“ expected,” in the sense we have already given, namely, that the m/s 
assume the values which they would take if the hypothesis were exactly 
verified for the particular set of data. 

We shall write: 

or,-m r -m r .... (22.1) 

so that the # r ’s are the excesses of the actual over the expected frequencies. 

Clearly the quantities x embody all the information in the data about 
the discrepancies between theory and fact. If the a*’s are all zero, fact 
and theory are in perfect agreement. If the afs are large, the agreement 
is poor. 

Example 22.1 .—As a simple example let us consider the 2x2 con¬ 
tingency table of Example 8.5, page 40. Numbering the cells from left 
to right wc have : 

m x -- 270, m % — 3 

ra 3 = 473, w 4 =66 

Now let our hypothesis II be that inoculation and exemption from attack 
are independent. If this be so, the expected frequencies are: 

m l =255*5, w 2 — 23*5 

m z =493*5, m A — 45*5 

and hence we have: 

x x - m 1 - m x = 20*5, x 2 - - 20*5 
x z ~ - 20*5, iT 4 - 20*5 

The x's are, in fact, in this particular case, the numbers we referred to in 
Chapter 3 as 8-numbers. We have already considered them as rellecting 
the divergence of fact from theory. 

Constraints. 

22.5. In the example wo have just considered, one important effect 
is to be noted, viz. that when wc have calculated one independent 
frequency, say m lf the other three follow arithmetically from the fact 
that the two frequencies in any row or column must add up to the border 
frequency in that row or column. 

In fact, we have: 

x x } a? 2 = 01 

x x -f # 3 = 0V . . (22.2) 

x 2 +d? 4 =o) 



415 


THE x 2 DISTRIBUTION, 

We need not add x z +<z 4 *=0, since this is given by the last two equations 
in conjunction with the first. There are only three independent equations. 

Thus, whatever our hypothesis H may be, the conditions of the 
problem impose limitations, expressed by the equations (22.2), on the 
way in which the m’ > s and the may be chosen. If one m or one x 
is fixed by H, the other three are determinate in accordance with the 
conditions of the data themselves. 

Similarly, suppose we wished to examine the height data of page 94- 
in the light of the hypothesis that the parent distribution, of which this 
is a sample, is normal with given mean and standard deviation. With 
the aid of the table of the probability integral we can determine the cell 
frequencies on this hypothesis ; but again the problem imposes a limita¬ 
tion on the way in which the theoretical cell frequencies are assigned, 
namely, that they must add up to the total number 8585 of the sample. 
When 20 frequencies are fixed, the other is determined by mere arithmetic. 

22.6. In general, when the conditions of the problem impose limita¬ 
tions of this kind on the number of cell frequencies which may be fixed 
by II we say, borrowing an expression from Statics, that they impose 
constraints. In the example of the 2x2 contingency table there were 
three independent constraints, expressed by the equations (22.2). In the 
ease of the height distribution then* is one constraint expressed by the 
fact that the sum of the cell frequencies must be 8585. 

Linear Constraints. 

22.7. Constraints wdiieh involve linear equations in the cell frequencies 
(i.e. equations containing no squares or higher powers of the frequencies) 
are called linear constraints. The two instances above are of this 
type. Linear constraints are of paramount importance, and we shall 
shortly coniine our attention to them alone. 

Degrees of Freedom. 

22.8. We denote the number of independent constraints in a set 
of data by k. We then deline the number v by the simple equation 

r — n — k 

and call v the number of degrees of freedom of the aggregate of cells. 
It is the number of cell frequencies which can be assigned at will, the 
remaining k following from the conditions to which the data are subject. 

Thus, for the 2x2 table k - 3 and e -1, for, as we have seen, the fixing 
of one cell frequency fixes them all. For the height distribution k = 1, 
v ~ 20. 

Example 22.2. —Let us find the number of degrees of freedom of a 
p xq contingency table. 

The constraints of such a table a/e similar to those of the 2x2 table. 
Thus the sum of the cell frequencies in each row is determined as being 
the border frequency in that row T , and similarly for the columns. Hence 
each of the p columns and q rows imposes a constraint. From the total 
p+q constraints we must, however, subtract one, for they are not* 
algebraically independent; there is one relation between them, expressed 
by the fact that the sum of the border column equals the sum of the 
border row, namely, the total frequency N. 



416 THEORY OF STATISTICS. 

Hence there are p+q- 1 independent linear constraints, lienee. 


v — n - k 

=pq-{p+q- 1 ) 

= (P~ 1)(?-!) 

We might have got this result more directly by considering that the 
cell frequencies in the first p -•1 columns and q ~ 1 rows arc determinable 
at will, the rest following automatically from the border frequencies. 
Hence the number of degrees of freedom, being the number of cells which 
can be so filled, is (p -1 )(</ -1) as before, 

22.9. Now let us consider a set of data arranged in n cells, the total 
frequency being N. 

The theoretical frequency in the rth cell is w 7 . This means that the 
chance of an individual falling into this cell is and the chance of its 

not doing so is ). We may regard the actual frequencies m as 


having been arrived at bv distributing the N individuals among the 
n cells in such a way that the chance of an individual falling into the 

7fl 

rth cell is Hence the probability that of the N individuals, m, fall 
into the rth cell and the remainder elsewhere is the term 


in the binomial 


m t \“r / 

sNJ v~n 



Thus, this binomial will give us the relative frequencies of the various 
values which m r can take in different samples, of which the actual data 
form one. 

tn 

If N is fairly large and ^ is not small, this distribution is approxi¬ 
mately normal with mean »?,. That is to say, m r is distributed normally 
about a mean m ,, or jr r is distributed normally about zero mean. 

Definition of y 2 . 

22.10. We now define the quantity y 2 by the equation 



the summation being taken over the n cells. 

The student can verify for himself that this definition is consistent 
with that given in equation (5.4), page 68, for the particular ease of 
divergence from independence in a contingency table. 



THE X 2 DISTRIBUTION. 

We can write x* in a slightly different form. For 



-TO r ) 

m r 


2 } - sM +s( T - r -) 

J \ m r / V ra r / ' m r / 


S(^)~2S(m r )+S(m r ) 



417 


(22.4) / 


This corresponds to equation (5.7), page GO. 

22.11. If x 2 ~“D all the or' s are zero, and hence the actual cell fre¬ 
quencies coincide with the expected cell frequencies. On the other hand, 
if some or all of the ar\ are large, x 2 will he large. 

It will thus be evident that x 2 affords a measure of the correspondence 
between fact and theory. It must not be forgotten, however, that it 
ignores the signs of the <r’s and hence takes no cognisance of certain 
information which those signs ma> convey. We shall take 1 up this point 
again later. 

22.12. If the use of y 2 is to be satisfactory, w r e must be able to dis¬ 
tinguish significant values from those* which may have arisen by sampling 
fluctuations. This leads us to inquire what is the probability of getting 
a particular value of y 2 from a set of m r \ chosen at random, and this in 
turn leads to the question : What is the sampling distribution of \ 2 • 

We shall not give a proof here of the important answer to tills question, 
but shall content ourselves with quoting it and indicating briefly the 
method by w hich it is obtained. 

We have already seen that the sum of n normally distributed variates 
is itself normally distributed (12.8). The sum of the squares of n normal 
variates is not so distributed, however. In fact, the sum of the squares 
of n normal variates, drawn from a universe w r itli unit standard deviation, 
is distributed in a form given by the equation 

N * 

y~y 0 e *2 B - 2 . . . . (22.5) 

where X 2 is the sum in question. 

Now it has already been shown that under the conditions assumed 
the or' s are each distributed normally about zero mean, and it may be 
shown further that x 2, may regarded as the sum of the squares of v 
variates each distributed normally with unit s.d. and about a zero mean. 
Hence the distribution of y 2 is given by 

2 X' ~ l • * • • (22.G) 1 

22.13. It follows, as in 20.8, that if we take a random set of ins 
and calculate x 2 from them, the probability of getting a value of \ 2 as 
great as, or greater than, this observed \ alue y 0 2 , is the area of the curve 
(22.6) to the right of the ordinate at Xo divided by the total area of the 
curve; or, in the language of the integral calculus, 

1 Since the variate m this expression is /, the distribution should, perhaps, he 
known as the ^-distribution, not the ^-distribution. The latter name is. however, 
in universal use, and the tables of the integral of equation (22.7) are usually prepared 
with argument # 9 . 


27 



418 


THEORY OF STATISTICS. 


r »* 

M ‘X ' i(i x 

P-f ... ( 22 . 7) 1 

yo<’~ 2 x'~ i(l x 

JO 

The curve, as we shall see later, extends from 0 to H x, which accounts 
for the limits of the integral in the denominator of the above expression. 

Tabulation of P for the x 2 Distribution. 

22.14. The rather formidable result of equation (22.7) need occasion 
no alarm to the student who is unacquainted with the notation and 
methods of the integral calculus. The function P has been tabulated 
for certain ranges of v and x 2 in the same way as the probability for the 
normal curve, and the tables are in most eases sufficient for the practical 
application of the results of the present chapter. 

Tables for v~- 1 are given at the end of this book (Appendix Tables 
4A and 413). Tables for v -= 2 to v —20 are given in “ Tables for Statisticians 
and Biometricians , Part If and in the same book are supplementary 
tables for ranges aid side those limits. 2 

For most practical purposes it is not necessary to calculate P to any 
great degree of accuracy, and the diagram m the Appendix has been drawn 
to obviate the use of the tables. In this diagram (lig. Al) curves have 
been drawn to show the relationship between v and y 2 for various values 
of P. The use of the diagram will be apparent from the examples below. 

22.15. It is desirable to point out that other writers have used 
different letters to denote the number of degrees of freedom. Karl 
Pearson, in the tables to which we have just referred, used the number 
n\ which is one more than our i\ II. A. Fisher writes n inste ad of our v, 
so that we have : 

v -n f ~ 1 (Pearson) ~ n (Fisher) 

We have thought it desirable to introduce the symbol v in order to avoid 
confusion with the use of n' and rt as numbers in a sample or in a universe. 

The x 2 Test of Significance when the Theoretical Cell Frequencies 
are known a prion. 

22.16. Armed with the tables of P, or the diagram of Ihe Appendix, 
we can now proceed as follows : - 


1 The* actual x allies ul I* are. expanding this integral. 


P=* 


\ /8 /V‘'W 3 « k! (o + 

* 4 > * \i i.a i.a.s 


7’ 2 _ ) 

' 1.53.5 . . . (v -2)/ 
if v is odd 


-lx 4 . 


H* 

2 


} Z® 

2.4 2.4.6 


f _ 

2.4.0 


. . .(v-2) 


) 


if v is even 


The first term of the first seiies may be obtained from the probability integral. 

* The work in the introduction to these Tables is inaceurate in some eases, par¬ 
ticularly in the treatment of contingency tables, owing to the use of the wrong number 
of degrees of freedom. 




419 


THE x 2 DISTRIBUTION. 

Having decided on the hypothesis to be tested, we calculate from it 
the theoretical frequencies m r (For the present wc assume that this can 
be done without reference to the observed frequencies m,. The contrary 
case will be considered later.) 

From the m/s and the m r ’s we calculate x 2 according to (22.8) or (22.4). 
We also ascertain v. 

Then, from the tables, we find the value of P corresponding to these 
values of x 2 an d v. 

The value P gives us the probability that on random sampling we should 
get a value of x 2 as great as, or greater than, the value actually obtained. 

Now, if P is small, our data give us an improbable value of x 2 « Tims 
we have the alternative conclusions that either (a) an improbable event 
has occurred, or (b) that the divergence of fact from theory is significant 
of some real effect and cannot be attributed to fluctuations of sampling. 
The smaller P is, 1 ho more we incline to the latter alternative ; if we do 
decide to adopt it. the inferences wc draw will depend on the nature of the 
problem. Sometimes it will lead us to reject our hypothesis. Sometimes 
it will lead us to suspect our sampling technique. 

The following examples will illustrate the type of reasoning involved in 
applying the x 2 lest. 

Example 22.3. - In some experiments on dice-throwing W. V. R. Weldon 
rolled 12 dice 20,80(5 times, observing at each throw the number of dice 
recording a 5 or a 0. 

If the dice are unbiased, the chance of getting a 5 or a <5 with one die 
is Hence the chances with 12 dice of getting 12 5\s or t>\ 11 5\s or 6\ 
etc., arc the successive terms m the binomial (\ 4- ;) 1 “. Hence the theo¬ 
retical frequencies in 20,800 throws arc the terms in 20,800 U + rO 12 . 
These are our m,\. 

The following table shows the actual (m,) and the theoretical (m t ) 
frequencies, together with the values of : 

Table 22.1.- 12 Dice Ihroiui 26, >06 Times , a Tfuoiv of 6 or 6 mhoned a Success. 


Number of 

Observed 

Theoretical 

m - m 

' {ifi w)K 

Suecebhes. 

F» equt ney 

w. 

Eiequt ney 
(m). 

(x). 

m 

0 

185 ! 

203 

- 18 

1-596 

1 

1,149 

1,217 

- 68 

3-800 

2 

3,2(55 

3,345 

1 - 80 

1-913 

3 

5,475 

5,576 

- 101 

1-829 

1 

6,114 

6,273 

169 

4-030 

5 

5,194 

5,018 

1-176 

6-173 

6 

3,067 

2,927 

4 140 

6-696 

7 

1,331 

1,254 

4 77 

4 728 

8 

403 

392 

4 11 

0-309 

9 

105 

87 

4- 18 

3-724 

10 and over 

| I 8 

14 

4- 4 

1-143 

Totals 

26,306 

26,306 

0 

35-941 


Hence V =85-941, and v = one less than the number of cells = 10. 



420 


THEORY OF STATISTICS. 


From the ‘ * Tables for Statisticians and Bio metricians” we have, when 
p *10 (n f *11), 

P -0*000857 for y z 
P -0 000017 for x 2 -" M) 

Evidently when x 2=:a 5-941, P will be extremely small. If we want to 
evaluate it exactly we can proceed by the methods given in the Tables. 
In fact P = 0 000086. 

Alternatively, from the diagram we see that when yf *35*91 and v * 10, 
the value of P lies slightly below 0*0001, for the point with ordinate 10 and 
abscissa 35*94 lies close to, but below, the curve labelled P *0-0001. 

Thus the probability that, on random sampling, we should get an 
equally or less close approach to the observed value of y 2 less than one 
in 10,000. 

We may therefore say that the correspondence between theory and 
fact is very poor. The extreme improbability of the observed event 
enables us to say with some confidence that the divergence between the 
two is significant, and hence that either our sampling technique or our 
hypothesis is at fault. Now in this experiment Weldon took particular 
care with the dice-throwing, and we may regard it as unlikely that there 
was anything seriously wrong with the randomness of the sampling. We 
are therefore led to doubt our hypothesis that the dice were unbiased. 

Briefty, then, the y 2 test suggests that the diee were biased. 

Example 22 J .—(Data from ref. (74).) The following table shows the 
result of inoculation against cholera on a certain tea estate :— 


Tama. 22.2. 


Inoculated 

Not-iiioculatcd. 

Total 


| Not atttu kwl 

All uc 1\( (1 

1 Total 

431 | 

r> 

436 

(127 7) , 

(8 3) 


291 1 

9 

300 

(294 3) i 

(r> 7) 


1 

| 722 ! 

14 

736 


We shall explain the figures in brackets presently. The question on which 
we want to throw light is : Is there any significant association between 
inoculation and attack ? 

To answer this, let us take for our hypothesis H the supposition that 
they are independent. If this is so, the expected frequencies, calculated 
in the manner of Chapter 3, are those given in brackets. These we take 
to be the m/s, the m/s being the actual frequencies. We then have: 


and 


X 2 * (3*3)2 


f 1 .J 

1427-7 8*3 


4 


1 

294-3 + 


1 1 
5 7 J 




From Appendix Tabic iB, P -=0*0700. 


*3-27 




421 


THE x 2 DISTRIBUTION. 

Thus if H is true, our data give a result which would be obtained about 
seven times in a hundred trials. This is infrequent, but not very in¬ 
frequent. Moreover, the theoretical frequencies in the “ attacked ” 
column arc not very large. We should therefore be unjustified in rejecting 
Ii on this evidence, but we can say that the data lend some colour to the 
supposition that H is not correct. 

To sum up, the x 2 test shows that the data incline us, though not 
strongly, to the belief that inoculation and attack are associated. 

Example 22.5 .- (Imaginary data.) An investigator into chocolate 
consumption divided the United Kingdom into eight areas and took a 
random sample from each, the mdi\iduals so obtained being classified as 
consumers or non-consumers of chocolate. His results were as follows:— 


Tabu: 22.8. 


Area Nun) Ik i 

1. 

! 2. 


4 - 

f>. 

0. 

7. 

8. 

Total. 

Consume! s 

!>fi 

(ISS) 

1 87 

1 (S!) 

142 

(152) 

71 

(6‘t) 

88 

(00) 

72 

(72) 

100 

(05) 

142 

(114) 

758 

Non-cousumers 

17 

(l«) 

20 

, (2«) 

1 ^ | 

1 (48) , 

_ 

20 

(22) 

! ;tl 
| (2") 

23 
' (23) 

1 23 

1 (30) | 

..1 

48 

(40) 

242 

Total 

715 

107 

200 j 

01 

ill) 

"or, 

125 

I 

190 

1000 


l)o these results suggest that the consumption of chocolate varies 
from place to place ? 

Let us take as our hypothesis II the supposition that it docs not, i.e. 
that the two attributes in the above table are independent. The theo¬ 
retical frequencies m 1 are then those shown in brackets, and we have: 

l 2 G 2 

Y 2 _ f .411 similar terms 
5u 81 

- 0*28 

The table has two rows and eight columns, and hence v ~ (2 -1)(8 -1) ~7. 
From the diagram of the Appendix, the point whose abscissa is 6*28 and 
ordinate 7 lies between the lines P - 0*75 and P 0*5, \er\ near the latter; 
or alternatively, from the "Table s Jot Statist iciutts and Bio met? leians” 
fore-7 (n' 8), 

if x 2 0, P = 0-539750 

if X 2 -7, P= 0-428880 

Hence, for X 2 - 0*28, P =0*51 approximately. 

Thus there is no cause to suspect our hypothesis, and the data do not 
suggest that the consumption of chocolate varies from place to place, at 
least so far as this test is concerned. 



422 


THEORY OF STATISTICS. 


Properties of the y* Distribution. 

22.17. The cui'ves 

\* 

y - y n c 'V 1 

and the probability function P derived from them, have several interesting 
properties which are worth noticing. As y 2 is essentially positive, we 
consider only positive values of the variate. 

(a) In the first place, it will be seen that when v = l the curve is the 
normal curve with unit standard deviation, for positive values of the 
variate. Thus the test for v - 1 may be reduced to testing the significance 
of deviations of a normally distributed variate. 

( b ) When v > 1 the curve is of the single-humped type. It is tangential 
to the iT-axis at the origin (y 2 - 0), rises to a maximum where y 2 - v - 1 and 
then falls more slowly to zero as y 2 increases indefinitely. Tt is thus skew 
to the right. 

( c) As v increases, the curve becomes more and more symmetrical. In 
fact, when v is large, \2y 2 is distributed approximately normally about a 
mean V^r-l with unit standard deviation. This result, due to R. A. 
Fisher, enables us to dispense with table s of P for large values of f, say 
v > 30, and to use tlu i probability integral instead. In practice large 
values of v are rather infrequent. 

Example 22 6‘.—To find P when y 2 -^Gl and 41. 

We know that V2y 2 is distributed normally about mean a/ 82 -1-9 
with unit standard deviation. When y 2 -<H, V 2y 2 11 *311, which 
therefore has a de viation 2*311 to the right of the mean. Hence we have 
to find the ami of the probability run e to the* right of the ordinate which 
is 2*311 units to the right of the* mean. From Appendix Table 2 this is 
seen to be 0*0101 approximately. 

\/Conditions for the Application of the y 2 Test. 

22.18. We may conveniently bring together at this point the various 
precautions which should lie observed in applying the y 2 distribution to a 
test of significance. 

^ (fl) In the first place, N must be reasonably large. Otherwise the 
are not normally distributed. 

This is a condition which is almost always fulfilled in practice. It is 
difficult to say exactly what constitutes largeness, but as an arbitrary 
figure we may say that N should he at least 50, however few the number 
of colls. 

v (5) No theoretical cell frequency should be small. Here again it is 
hard to say what constitutes smallness, but 5 should be regarded as the 
very minimum, and 10 is better. 

In practice, data not infrequently contain cell frequencies below these 
limits. As a rule the difficulty may be met by amalgamating such cells 
into a single cell. Thus, in Example 22.3 above, the theoretical numbers 
of throws with 10, 11 and 12 successes are (to the nearest integer) 18, 1 
and 0. Instead of putting each into a separate cell we have run them 
together into one cell “ 10 and over/' 



THE x 2 DISTRIBUTION, 423 

(c) The constraints must be linear. The reason for this condition has 
not emerged explicitly in the foregoing because we omitted the stage in 
the proof of the x % distribution at which it occurs. 

22.19. To these three conditions we may add the following remarks, 
which should also be borne in mind when the x 2 test is being used. 

(a) The x 2 tolls us the probability of getting, on a random sample, 

a value of x 2 equal to or higher than the actual value. If t his probability is 
small we are justified in suspecting a significant divergence between theory 
and experiment. 

We cannot proceed, however, in the reverse direction and say that if P 
is not small our hypothesis is proved correct. All that we can say is that 
the test reveals no grounds for supposing the hypothesis incorrect; or 
alternatively, that so far as the x 2 i<‘S*t ls concerned, data and hypothesis 
are in agreement. 

( b ) Nor do only small values of P lead us to suspect our hypothesis or 
our sampling technique. A value of / , \crv near to unity may also 
do so. 

This rather surprising result arises in this way: a large value of P 
normally corresponds to a small value of y‘ 2 , that is to sav a very close 
agreement between theory and tact. Now such agreements are rare— 
almost as rare as great divergences. 

We are just as unlikely to get very good correspondence between fact 
and theory as we arc to get very bad coi respondenec and, for precisely the 
same reasons, wc must suspect our sampling technique if wc do. In short, 
very close correspondenec is too good to he true. 

The student who feels some hesitation about this statement may like to 
reassure himself with the following example. An in\ i stigator says that he 
threw a die (>()0 times and got exactly 100 of each number from 1 to 6. 
This is the theoretical expectation, x 2 0 and P - i, but should we believe 
him? Wc might, if wc knew him very well, but wc should probably 
regard him as somewhat lucky, which is only another way of saying that 
he has brought off a very improbable event. 

22.20. At this point we can resume a topic which we laid on one side 
in 22 . 11 , namely the signs of the <? 7 \ wdiieh are ignored by y 2 . 

It may happen that x 3 has quite a moderate \alue and P is not small 
when all the positive x's arc on one side of the mode of the theoretical 
distribution and all tlie negative x\ on the other. There will thus be a 
consistent “ shift ” of the m\ one way or the other from the m' s. This 
may give us a value of the mean quite outside the limits of sampling. 
Again, if the <r\s are all negative in the cells farthest remo\ed from the 
mean, the standard deviation may show an almost impossible divergence 
from expect at ion. 

Thus, although the x 1 test may reveal no cause to suspect Ihc hypothesis, 
a closer examination of the ay’s may. 

Example 22.7 .—Consider the following dice data ('Fable 22.4) (Weldon, 
see p. 851). 

Now, in this example, all the ,r’s arc negative up to 5 successes, positive 
from 6 to 10 successes, and negative again for 11 to 12 successes. This is 
almost one of the eases we referred to earlier in tins section. 

We have, in fact, already found (Example 19.3, page 352) that the 
mean deviates from the expected value by 513 limes the standard error. 


y 



424 


THEORY OF STATISTICS 


Table 22.4.— 12 Dice thrown 4096 times , a Throw of 4, 5 or 6 Points 
reckoned a Success * 


Number of 
Successes. 


0 

1 

2 

a 

4 

5 
0 

7 

8 
9 

10 

11 

12 


Observed 
Froq uenev 
(rn). 


Expected 

Frequency 

(rn). 


4096(J +%) u 


0 

7 

GO 


198 

no 

7ai 

948 
8 17 


,>36 


257 

71 

“111 

Of 11 


1 

12 

GG 

220 

495 


792 

924 

792 


495 

220 


(>(> 

12) 

1 / 



m - m 


- 1 

- r> 

- « 
-22 
-65 

ci 


55 

41 

97 



Tot tls 


409(1 


4096 



(m - m ) 2 
m 


1*0000 

2 0833 
0 5455 
2 2000 
8 5354 
4 6982 
0 6234 

3 8194 
3 3960 
6 2227 
0 3788 

0 3077 


33 8104 —% A 


From the tables we find 

i n' y l V 

12 13 30 0 002792 

F> 13 10 0 000072 

Hence, by simple mteipolation for^ 2 -33 8104, P ~0 0018. 

As a matti i of fact, smi]>le mtoi point inn is of 'very little value for small values 
of V (rj. 24.12), and this value is wide oi the maik, the tine \ slue being 0 00072. 

A better idea is to be gamed fiom the Apptndix diagiam, from which it is seen 
that F lies between 0 001 and 0 0001. Jn any ease, the value of P i« small, but not 
overwhelmingly small. 

From the extended tables of the normal integral in “Tables for Statisticians 
and BiomctricianSf Part If we have: 

Greater fraction of the area of a normal 

curve for a deviation 5-13 . . . 0*9990998551 

Area in the tail of the eurve . . . 0*0000001449 

Area in both tails. 0*0000002898 


so that the probability of getting such a deviation (-f or -) on random 
sampling is only about 3 in 10,000,000. 

Comparing this with the value of P, we see that the data are really more 
divergent from theory than the y 2 test w'ould lead us to suppose. 

22.21. Hence, if the signs of the afs show any marked peculiarities, 
it is as w r eil to apply as many supplementary tests as are available, and 
not to rely on the y 2 test alone. Such tests w^ould include those for the 
significance of the mean and standard deviation, which we have already 
discussed. 


Levels of Significance. 

22.22. In the examples we have given above, our judgment whether P 
was small enough to justify us in suspecting a significant difference between 




THE x 2 DISTRIBUTION. ' 425 

fact and theory has been more or less intuitive. Most people would agree, 
in Example 22.8, that a probability of only 0*0001 is so small that the 
evidence is very much in favour of the supposition that the dice were biased. 
But we shall not always get such a decisive result. Suppose we had 
obtained P ~0*1, so that the odds against the event are nine to one. Is 
this value .small enough to lead us to suspect the dice ? If it is not, would 
P = ()01 be small enough ? Where, if anywhere, can we draw the line ? 

The odds against the observed event which influence a decision one 
way or the other depend to some extent on the caution of the investigator. 
Some people (not necessarily statisticians) would regard odds of ten to one 
as sufficient. Others would he more conservative and reserve judgment 
until the odds were much greater. It is a matter of personal taste. 

22.23. There are, however, two values of P which are widely used to 
provide a rough line of demarcation between acceptance and rejection of 
the significance of observed deviations. These values are P — 0*05 and 
P ~ 0*01, and arc said to define 5 per cent, and 1 per cent, levels of significance. 
The value P —0-001, i.e. the 0*1 per cent, level, is also used. If we choose 
to adopt these levels, our attention will be focused, not as heretofore on 
the actual value of P, but on the fact whether it falls above or below the 
levels of significance. To facilitate the imostigation of this aspect of the 
matter, II. A. Fisher has prepared tables (published in his “ Statistical 
Methods for Research 11 or Jeers ”) in a different form from those of 44 Tables 
for Statisticians and Biometriciansf which are due to W. Palin Elder!on. 
The latter, as we have mentioned, give the values of P corresponding to 
given values of and v. Fisher’s tables give corresponding to given 
values of v ami P, and among those values are P 0*05 and P-0-01- -the 
significance levels. 

The diagram of the Appendix expresses a similar point of view, and gives 
the curve > of relationship between and v for constant values of P, or, in 
short, the contour lines of the surface 

The diagram gives the 5 per cent, and 1 per cent, lines and also those 
corresponding to the smaller probabilities P — 0 001 and 0*0001, i.e. the 
0*1 per cent, and the 0*01 per eenl. levels. 

A value of P less than 0*05 will be* said to fall below the 5 per cent, level 
of significance, and so on. 

Example 22.8. -Let us consider the data of Exercise 8.11. In experi¬ 
ments on the Spahlinger anti-tuberculosis vaccine the following results were 
obtained. (As before, the figures m brackets are the independence values.) 



Died or Seriously 1 

UnalTected or Not 

i 

1 Total. 


Affected. 

Seriously -\ fleeted. 

I 

Inoculated . . | 

6 

(8 87) 

13 

(10 13) 

■ 

j 19 

Not inoculated or inoou-/ 

8 

3 

j 11 

lated with control media \ 

(5 13) 

(5 87) 


Total 

14 

16 

30 



426 


THEORY OF STATISTICS. 


" ere > x * = 4-75 and v = l 

From Appendix Table 4B we have P=0-02!) approximately. 

Alternatively, from Fisher’s tabic we have, when v - 1 , 

for P -0 05 x 2 =8-841 
anf1 for P -- 0-01 x 2 =(5-635 

so that, from either table, P lies between the 5 per cent, level of significance 
and the 1 per cent, level. 

If, therefore, we take the 5 per cent, level as appropriate to this case, 
the results are significant ; but if we are more conservative and take the 
1 per cent, level, the results are not significant. In this particular case 
the position is complicated by the relative smallness of the theoretical cell 
frequencies. 

The Additive Property of x 2 * 

22.24. It sometimes happens, by the repetition of experiments or 
otherwise, that we have a number of tables for similar data from different 
fields. The values of P for each may not be entirely conclusive. The 
question then arises whether we cannot obtain a value of P for the aggre¬ 
gate, telling us what is the probability of getting, by random sampling, a 
series of divergences from theory as great as or greater than those observed 

The question is usually answered by pooling the results to form a single 
table. But, apart from the fact that this is not always possible, we have 
already seen (Chapter 4) that pooling is likely to introduce fallacies. A 
better method is to proceed in accordance with the following general rule. 

22.25. Suppose we have a number of groups of data, each furnishing a 
X 2 and a v. Add together all the x 2 ’ s to form a single \alue Xi 2 > and all 
the z/’s to form a single value v v The x 2 test may then be applied to Xi 2 
and as if they came from a single set of cells. 

The validity of this rule will be evident when we consider how the x 2 
test was arrived at. The variate x in every cell is normally distributed 
about a mean m, and Xi 2 is the sum of the squares of quantities like 

— iust as y 2 was. This, together with the linearity of the constraints, 
m A 

which remains, was the essential pari of the* proof of the x 2 distribution, 
and hence the test remains true for Xi 2 and iq. 

Example 22/L- In Example 22.4 (inoculation against cholera on a 
certain tea estate) vie saw' that the x 2 lest, although suggesting that 
inoculation had some effect in immunising, did not allow us to place any 
great confidence in such a conclusion. The following data give x 2 and P 
for six estates, including the one we have already discussed : - 


Total 


r* 

P. 

9-84 

0*0022 

0-08 

0*014 

2-51 

0*11 

3*27 

0*071 

5*61 

,0*018 

1*50 

0*21 

28*40 




427 


THE x 2 BISTEIBUTXON. 

Here only one value of P is less than 0-01, and we might be inclined to 
doubt whether the association between inoculation and immunity is real. 
Let us, however, add the values of x 2 and °f v, We get y t 2 = 28*40 and 
v x =6, there being one degree of freedom from each of the six tables. 

From the diagram of the Appendix we see that for these values P is 
slightly below the value 0*0001. If we require greater accuracy, from the 
tables we have : 

y\ P. 

28 0*000094 

29 0*000061 

Whence by interpolation P - 0 00008 approximately, i.e. we should expect 
to get a x 2 as great as tins only 80 times in a million. We can, therefore, 
regard the results, taken together, as significant with a high degree of 
confidence. 

Estimation of Theoretical Frequencies from the Data. 

22.26. Our theoretical frequencies m may be calculated partly on 
the basis of information from the data, partly on a priori grounds. Thus, 
in the dice-throwing data of Example 22.8, our hypothesis that the dice 
were unbiased enabled us to sa\ that the chance of getting a 5 or a 6 was 

and lienee that the chances with 12 dice were the terms in 26,306 (| +^) 12 . 
Here we take only the value of N t the total frequency, from the data. 

In the association and contingency tables, the values of row and 
column totals, as well as A T , are taken from the data and we assume 
a priori that the attributes me independent. 

It may be, however, that we draw further information from the 
data themselves in fixing the theoretical frequencies. In such eases an 
important modification is necessary in the previous methods of work, for 
the number of degrees of freedom is further restricted by each piece of 
information drawn from the data, as we have already seen for contingency 
tables. 

22.27. Consider, for example, the dice-throwing data of Example 22,3. 
We have already seen that the dice were probably biased, so that the chance 
of a success was not ]. What, then, was it ? 

To answer this question we ran only appeal to the data. The pro¬ 
portion of 5’s and 6\s in the total number of throw's of individual dice 
(26,306 xl2) was 0*3377. Let us therefore take this to be an estimate of 
the true probability. Wc can be confident that it will be somewhere 
very close, owing to the large number in the sample. The theoretical 
frequencies will then be the terms in 26,306 (0*6623 +0-3377) 12 . 

To take a second ease : consider the height distribution of Table 6.7, 
page 94. We have already had reason to suspect that this is a sample 
from a normal population. If we suppose this hypothesis to be correct, 
the question arises, What is the mean and standard deviation of the 
universe? Here again we must estimate these quantities from the data, 
in the manner of Chapter 20. 

22.28. We shall denote values of the theoretical frequencies which are 
calculated from parameters estimated from the data by the letter m\ and 
the value of calculated from them by ^' 3 , so that we have : 



428 


THEORY OF STATISTICS. 


Now, ^' 2 is an estimate of x 2 and, if the m n s are close to the m’s, x 2 will 
be close to x a * X 2 ls ma <fe up of two parts, one measuring the divergence 
between theory and fact, the other due to errors of estimation of x 2 - If 
the second is small compared with the first, we may expect that the x 2 
test, applied with x 2 instead of the unknown # 2 , will continue to reveal 
significant differences between theory and fact where such exist. 

22.29. The question as to the precise conditions under which the 
test is applicable for such cases has not been completely answered, but 
it has been shown that, if the cell frequencies are large, the test still 
applies subject to the following conditions : - 

(a) The number of degrees of freedom must be reduced by unity for 
each constant of the universe which is estimated from the data. 

(b) The estimates must be of the type known as “ efficient.” 

We shall not be able in this Introduction to go into the theory of this 
important class of estimate, but it will be sufficient if we indicate that the 
estimates of the mean of a normal universe, and the parameter m of the 
Poisson distribution, arc “ efficient 15 il‘ calculated in the ordinary way, 
i.e. by taking the \alue of the parameter in the sample to be the value of 
the parameter in the universe. 

Example 22.10 .—Reverting to the data of Example 22.3, let us 
estimate the true chance of getting a 5 or a 6 from the data themselves. 
The frequency of the successful event is 0-3377 of the whole. This is 
an “ efficient ” estimate of the chance. The following table gives the 
observed frequencies and the theoretical frequencies calculated from the 
formula 26,306 (0*6623 4 0-3377) 12 : 

Table 22.5.- 12 T>ice thrown 26,306 Times , a Throw of 6 or 6 reckoned a Svecess. 


Number of 
Successes. 

Observed 

Frequency 

(m). 

Theoretical 

Frequency 

(m'). 

m - rn'. 

(m - m') 2 
m' 

0 

185 

187 

- 2 

0021 

J 

1,149 

1,116 

3 

0 008 

2 

.‘{,265 

3,215 

50 

0-778 

3 

5,475 

5,465 

10 

0 018 

4 

6,114 

6,269 

-155 

3-832 

5 

5,194 

5,115 

79 

1 220 

6 

3,067 

3,043 

24 

0-189 

7 

1,331 

J ,330 

1 

0-001 

8 

403 

424 

- 21 

1-040 

9 

105 

96 

9 

0844 

10 and over 

18 

16 

2 

0-250 

Total 

26,306 

26,306 

0 

8*201 


Thus x 2 = 8*201. There are 11 cells, with one linear constraint. We have 
also fitted one constant from the data, and hence we must take v=°9. 




THE x 2 DISTRIBUTION. 420 

From the diagram of the Appendix we then see that P is very close 
to 0*50. 

From the tables, for v -9 or n f =■ 10, we have: 

S 3 * P. 

8 0*5341 

9 0*4373 

so that P -^0*51 approximately. 

Thus our hypothesis is now, so far as the y 2 test is concerned, in 
agreement with experiment. 

Experiments on the x 2 Distribution. 

22.30. Several statisticians have conducted experiments to verify 
the theory which we have discussed in the foregoing sections. A certain 
amount of work in tins held remains to be done, but generally it may 
be said that experiment supports the theory. So far as eases where the 
tn\ are ealculated a ptiori are concerned there is little doubt of its 
correctness. 

In one set of experiments (ref. (511)) 200 beans were thrown into 
a revolving circular tray with 10 equal radial compartments and the 
number of beans falling into each compartment was counted. The 16 
frequencies so obtained were arranged (1) in a 4x4 table, and (2) in a 
2x8 table. x* was calculated from the independence frequencies, as in 
Example 22.5. 

The experiment and the calculations were repeated 100 times. The 
following table exhibits the actual and the theoretical distribution 
of x 2 


TAiu.n 22 .G. - Theoretical Distribution of ^ 3 , calculated from Independence Values , in 
Tables with lb Compartments , compared with the Actual Distributions given by 100 
Experimental Tables. In the first case v must be taken as ,9, in the second as 7 . 



4 Rows, 4 Columns. 

2 Rows, 8 Columns. 






X 






Expectation 

Obsex vation 

Expectation. 

Observation. 

0- 5 

16*6 

17 

34 0 

29 5 

5-10 1 

48*4 

44 

47*1 

56*5 

10-15 

26 0 

32 

15*3 

10 

15-20 

7*8 

6 

3*0 

3 

20- 

1 *8 

1 

0 6 

1 

Total 

100*1 - 

100 

100*0 

100 


In a second experiment with 2 x 2 tables 350 experimental tables of 
100 observations each were available. Table 22.7 shows the actual and 
theoretical distributions in this case. 




430 


THEORY OF STATISTICS. 


Table 22.7.— Theoretical Distribution of % l for a Table with 2 Hows and 2 Columns , when 
ts calculated from the Independence Values , compared zvith the Actual Results for 
350 Experimental Tables. 


Value of x 2 ' 

Number of Tables. 

Expected 

Observed. 

0 -0*25 

134*02 

122 

0-25-0*50 

48*15 

54 

0 50-0*75 

8*2 56 

41 

0 75-1 00 

24 21 

24 

1 -2 

56*00 

6*2 

2 ~3 

25 91 

18 

3 -4 

13 22 

13 

4 -5 

7 05 

6 

5 -6 

3*86 

5 

6- 

5 01 

5 

Total 

319 99 

360 


It is interesting to see what happens it we apply the y 2 test to these 
tables. 

In Table 22.(1, grouping together the frequencies from y 2 -15 upwards, 
so that y 2 is found to be 2-27 for the 4x4 tables and 4*8(i for the 

2x8 tables, giving P - 0*52 in the first case and 0*22 m the second. 

Ill Table 22.7, y 2 -- 7-58, e-9, P-OoH. 

Goodness of Fit. 

22.31. The y 2 distribution, as we have seen, leads to tests of the 
correspondence between theory and fact, and this anti other reasons have 
led to its being described as a test of the 44 goodness of fit/’ This expres¬ 
sion may be used in two ways. In the first place, it may describe the 
“ fit ” of observed and hypothetical data. In the second, it may be used 
without reference to a hypothesis merely to provide an objective method 
of estimating the merits of a particular formula or a particular curve in 
graduating a set of values or a series of points. 

The arithmetic in the second class of eases is exactly the same as in 
the first. Conventionally, we regard very low values of P as denoting 
a poor fit, and moderate values as denoting a reasonably good fit. High 
values show an excellent fit, and in considering them we take no heed of 
the point discussed in 22.19 (/>), since we are assessing the closeness of 
the curve to the data, not the probability that the first represents a universe 
from which the second was derived by random sampling. 




THE x 2 DISTRIBUTION. 


431 


SUMMARY. 


I. 


r s { 


(m - m) 2 \ 

i S 


-S i-)-N 

in ' 


where ih refers to the observed and m to the theoretical frequencies. 

2. The number of degrees of freedom of an aggregate of cells is denoted 
by r, and is equal to the number of colls whose frequencies can be deter¬ 
mined at will. When v cell frequencies are determined, the remainder are 
calculable directly from the conditions to which the cell frequencies are 
subjected by the nature of the data. 

3. The frequency-distribution of y 2 is given by 

y~w ‘V~’ 

4. From this it is possible to ascertain the probability P that on 
random sampling wo should get a value of y 2 as great as or greater than 
a given value. Tables have been constructed for this purpose. 

5. The y 2 distribution may be applied to data grouped m t‘ells provided 
(a) that the total number A' in the sample is large, (b) that no theoretical 
cell frequency is small, and (r) that the constraints are linear. 

G. The value of P for any given ease enables us to judge of the corre¬ 
spondence between hypothesis and data. 

7. When the theoretical cell frequencies have to he calculated from 
parameters estimated from the data, the y 2 test can be applied with 


instead of y 2 , provided that the cell frequencies are large, the estimates 
are “efficient,” and the number of degrees of freedom used in ascertaining 
P is reduced by unity for every parameter which is estimated. 

8. The value of P can also be used to gi\ e an objective criterion of the 
“goodness of lit” of a curve to a set of points or of a formula to a set of 
values. 


EXERCISES. 


22.1. The following table (Weldon) gives the results of a dice-throwing 
experiment:— 

12 Dice thrown 1000 Tunes, a Throw of 6 reckoned a Success. 


Number of Successes . 

0 

1 

2 ' 

3 

1 4 

5 

G 

7 and over 

Total. 

Frequency 

447 1 

! 

1145 

1181 

79(5 

j 380 

115 

24 j 

8 ! 

4096 


Find x 2 on the hypothesis that the dice were unbiased and lienee show that 
the data are consistent with this hypothesis so far as the test is concerned. 





THEORY OF STATISTICS. 


482 


22.2. Perform an experiment by throwing a die 600 times and noting the 
number of points at each throw. Use these data to inquire whether the die 
is biased. 

22.3. 200 digits were chosen at random from a set of tables. The frequencies 
of the digits were: 


Digit . 

.1 o' 

1 

2 I 3 |. 4 1 r» 

6 

7 

8 

9 

Total. 

Frequency . 

. 18 

T« 

23 | 21 J 16 25 

22 

20 

21 

15 

2(H) 


Use the test to assess the correctness of the hypothesis that the digits 
were distributed in equal numbers in tlie tables from which these were chosen. 

22.4. Perform an experiment on the lines of Exercise 22.3 by taking, say, the 
last figure in 200 logarithms taken from a set of live-figure logarithm tables. 

22.5, (Data: Yule, ref. (03).) Sixteen piece's of photographic paper were 
printed down to dilkrent depths of colom from nearly white to a very deep 
blackish blown. Small scraps were cut fjoin each sheet and pasted on cards, 
two scraps on each card one above the other, combining scraps from the several 
sheets in all possible wa)s, so that there vvere 256 cards in the pack. Twenty 
observers then went through the pack independently, each one naming each tint 
cither “light,” “medium” or “dark.” 

The following table shows the name assigned to each of the two pieces of 
paper 


Name assigned to 

Name assigned to Uppe 

r Tint 

Total 

Lower Tint. 

- 




bight. 

Mi drum j 

Dark 


Light 

850 

571 | 

580 

2001 

Medium. 

618 

~~kr r 

153 

1666 J 

Dark . 

540 

_1 

r>6 

457 

1453 j 

Total 

1 

2008 

1620 | 

1492 

5120 J 


Show that there is a significant association between the name assigned to 
one piece and the name assigned to the otliei. 

22.6. Apply the % 2 test to the data of Example 3.1), page 44, and examine 
the justification for the conclusions there drawn. 

22.7. Show that, if v is large, P is below the 5 per cent. level of significance if 

\ / 2x* - V2v - 1 1 65 

and below the 1 per cent, level of significance' if 

V'2 X ‘ - V‘2v - 1 2 33 

22.8. Tabic 5.6, page 78, gives the nunihci ui cumimils of normal and weak 
intellect for various ranges of weight. 

Assuming this to be a random sample of ciinunals, do the data support the 
suggestion that weak-minded criminals are not underweight? 

&2.&. Show that in a 2 ^2 contingency table wherein the frequencies are 

cflf calculated from the “independence” frequencies is 

(a+b + cd)(urt-be)* 

(a Hh b)(c + d)(b + d)(a -f t*) 



483 


THE X * DISTRIBUTION. 


22.10. Show similarly that for a 2 xn table 


X 


U.jv/S'- 

=SJ — ^ 

V If »- 


/“ar\ 

JV.j 


dir 


i 


where /i Xr , /; Jr are the 2 frequencies in the rth column and N u N 2 are the marginal 
sum>pf the 2 rows. 

. Two investigators draw samples from the same town in order to 
estmiate the number of persons falling in the income groups “poorer/’ “middle 
class,” “well to do.” (The limits of the groups are defined in terms of money 
and are the same for both investigators.) Their results are as follows:— 




Income Group. 

Investigator. 

— 



“ Poorer.” 

“Middle ('lass ” 1 “Well to do.” 1 Totals. 

! 1 

A 

140 

100 1 15 1 255 

B 

1 U9 

50 20 1 210 

__j_ _ _ 

Totals 

280 

1 150 35 * 165 

1 1 


Show that the sampling technique of at least one of the investigators is 
suspect. 

22.12. Exercise 10.17 gives the number of deaths per day of women over 85 
published in The Times during 1910 12. Using the theoretical frequencies 
obtained in that exercise on the hypothesis that the numbers are distributed in 
a Poisson series, employ the y 3 test to estimate the correctness of this hypothesis. 

22.13. Design and execute an expeument involving the y 2 test to test the 
randomieity of Tippett’s numbers. 

22.lt. (Data: G. Mendel’s classical paper on “Experiments in Plant* 
Hybridisation”—quoted in translation in W. Bateson’s “Mendel's Principles of 
Heredity,") 

In experiments on pea-breeding, Mendel obtained the following frequencies 
of seeds: ill5 round and yellow; 101 wrinkled and yellow; 108 round and 
green; 32 wrinkled and green. Total, 550. 

Theory predicts that the frequencies should be in the proportions 9 : 8 : 3 : 1. 

Examine the correspondence between theory and experiment, calculating P 
either directly (page 418, footnote) or by interpolation from tables. 

22.15. A particular experiment gives, on hypothesis II, y 2 9, i>~8; when 
repeated it gives the same result. Show that the two results taken together do 
not give the same confidence in H as cither taken separately. 


28 



CHAPTER 28. 


THE SAMPLING OF VARIABLES-SMALL SAMPLES. 
The Problem. 

23.1. We now proceed to examine the theory of samples which are 
not large enough to warrant the assumptions underlying the work of 
Chapters ]9 to 21. In particular, it will no longer be open to us to 
assume (a) that the random sampling distribution of a parameter is 
approximately normal, or even single-humped, or (b) that values given 
by the data are suHieientJy close to the universe values for us to be able 
to use them m gauging the precision of our estimates. 

The rental of these assumptions imposes severe restriction on our 
work, and, as we shall see, an entirely new technique is necessary to deal 
with the problems for winch they are not permissible. The division 
between the theoiics of large and small samples is therefore a very real 
one, though it is not always easy to draw a precise line of demarcation. 
We should point out, howe\cr, that as a rule' the methods of the theory 
of small samples are applicable to large samples, though the reverse is 
not true. 

Estimates. 

23.2. In the- theory of large samples we were able to take the value 
of a parameter in a sample to be an estimate of that parameter in the 
universe. This procedure, obvious though it seems, is not in general 
valid for small samples. We must therefore discuss briefly the basis on 
which estimates of given parameters are to be made. 

A full investigation of this question would take us far beyond the limits 
of this book. It involves matters of considerable mathematical and 
philosophical complexity, some of which still form the subject of dispute 
among statisticians. Jhit in the theory of small samples the main para¬ 
meters of interest are the mean and the standard deviation (or the 
variance), and we \vi 11 proceed to consider these two. 

Estimates of the Arithmetic Mean. 

23.3. We shall take as the estimate of the arithmetic mean the value 
of the sample mean. That is to say, if we have n sample values x v 
oj 2 , . . . x nl our estimate x of the mean in the universe is 

■S=-^S(*>.(23.1) 

For estimates of the mean, therefore, the practice is the same for small 
samples as for large. 

It may be shown that for samples from a normal universe an estimate 

434 



SAMPLING OF VARIABLES—SMALL SAMPLES. 435 

obtained in this way is the “ best ” in the sense that its sampling variance 
is less than that of any other estimate of the mean. 

Estimates of the Variance. 

23 . 4 . Let us denote the variance in the universe by <j n 2 and the 
mean by w. 

If m is known, we take as an estimate of the variance the mean square 
deviation of the sample about m ; t.e. the estimate, which we write as cr $ 2 , 
is given by 

.... (23.2) 

In general, however, we do not know the value of m, which will itself 
have to be estimated. In this ease equation (23.2) is no longer applicable. 

23.5. If m is the universe mean and x is the sample mean, we have: 

S(«r -m) 2 - S(,r -<? * x ~ m) 2 

~S(iT -1) 2 \ S(cF - m) 2 
**■ S(a? -x ) 2 t n(x - m ) 2 

Hence, 

e7,»-=\s(ir-.r)*H (,? mY 

The term (x-cF) 2 is the variance of the sample. We see that 

it differs from <j s 2 by the term (x - m) 2 . 

Now this term will not, in general, \anish ; nor will it vanish on the 
average in a large number of eases, for it is essentially positive. Hence, 
if we take the variance of the sample to be an estimate of the variance 
of the universe w r e shall involve ourselves in a systematic error of magni¬ 
tude (x - m) 2 . 

This term is the square of the deviation of the mean of the sample 
from the mean of the universe, and its average value in a large number of 

cr 2 

samples is the variance of the mean, which w T e know to be equal to 

It seems reasonable, therefore, instead of ignoring the presence of the 

cr 2 

term (x to take it as equal to “ . We will attempt, on this basis, 

a new estimate, which we shall write «o\ 2 . We have then: 

c a 8 2 *S(.r - I ') 2 f Gu 
c 8 n K 1 n 

The value of a u is unknown, but we may, as an approximation, write c a 9 
instead. If we do so we get: 


. ( 28 . 8 ) 



486 


TIIEOHY OF STATISTICS. 


The effect of taking C cr* 2 given by equation (23.3), instead of the 
variance of the sample, will thus be to eliminate the systematic error of 
estimation to which we have just referred. 

23.6. We may look at this in a slightly different way. Suppose we 
take a large number of estimates of the variance of a universe compiled 
according to equation (23.2), w being assumed known. These estimates 
will fall into a distribution which is the sampling distribution of the 
variance in samples of n. If, as will usually be the ease, it is of the single¬ 
humped type, we expect it to have a mean located at the true value of 
the variance in the universe. 

Now if we take as estimates of the variance the variance of the samples 
(each about its own sample mean), the above will not be true, owing to 
the small systematic shift represented by the term (!' - m ) 2 ; but it will 
be true of the estimates given by equation (23.3), and this is therefore 
a preferable estimate to take. 

23.7. Equation (23.3) w r as obtained by reasoning which does not 
depend on the size of and strictly speaking we should take it as applicable 
also to large samples. But if n is large, n and n -1 are for all practical 
purposes equal. With such samples our results are true only within the 

1 

range of the standard error, which is usually of order / , and there is 
* “ VII 

little point in straining after an lllusoiy refinement by taking n -1 instead 
of « in calculating the variance. 

From a similar point of view it might he thought that since the term 

<7 2 

is generally less than the square of the standard error of the variance, 

it is equally idle to make allowance for it in estimating the variance. 
This would be true if the term were zero on the average ; but in fact it 
is not, being a biased error, and we are justified in the long run in allowing 
for it. 

Furthermore, w r e may point out that the use of r cr 6 2 , the corrected 

a 2 

value obtained by allowing for the term is only valid on the average. 

if, on random sampling, we get a sample variance greater than the universe 
variance, the correction only makes matters worse, and may even lead to 
an absurd result. An instance happens to occur in 23.33 below r . 

Degrees of Freedom of an Estimate. 

23.8. In discussing the x 2 test we introduced the notion of number 
of degrees of freedom, being the number of cells in an aggregate whose 
frequency could be assigned at will. We may conveniently extend this 
nomenclature to estimates of parameters and particularly of variance. 

We shall refer to the divisor in the estimates of equations (23.1), 
(23.2) and (23.3) as the number of degrees of freedom of the estimates, 
and shall write it as v. Thus, p in equation (23.2) is «, and in equation 
(23.8) is n -1. 

That this convention conforms to that adopted for the x 2 tw>t may 
easily be seen. We saw that v is the number of cells, that is, the number 
of terms contributing to the x 2 sum, less one for each constraint and one 
for each parameter which had been estimated from the data. In the 



SAMPLING OF VARIABLES—SMALL SAMPLES. 487 

quantity S(#-w) 8 there are n independent contributions of the type 
(a? - m) 2 , and hence we may say that n is the number of degrees of freedom 
of that estimate ; but in the quantity S(x~$) 2 we have used the data to 
estimate and hence the number of degrees of freedom is lowered by 
unity, equals n - 1. 

Tests of Significance. 

23.9. It cannot be over-emphasised that estimates from small samples 
are of little value in indicating the true value of the parameter which is 
estimated. Some estimates will be better than others, but no estimate is 
very reliable. In the present state of our knowledge this is particularly 
true of samples from universes which are suspected not to be normal. 

Nevertheless, circumstances sometimes drive us to base inferences, 
however tentatively, on scanty data. In such cases we can rarely, if ever, 
make any confident attempt at locating the value of a parameter within 
serviceably narrow limits. For this reason we are usually concerned, in 
the theory of small samples, not with estimating the actual value of a 
parameter, but in ascertaining whether observed values can have arisen 
by sampling fluctuations from some value given in advance. For example, 
if a sample of ten gives a correlation coefficient of + 01, we shall inquire, 
not the value of the correlation m the parent universe, but, more generally, 
whether tins value can have arisen from an uneorrelated universe, i.e . 
whether it is significant of correlation in the parent. 

23.10. The remainder of tills chapter will accordingly be devoted to a 
brief discussion of various tests of significance. Within this book we 
shall not have space to deal with these tests as fully as we should like ; but 
our account of sampling methods would be incomplete without some 
reference to sundry results of great intrinsic interest and importance in 
the field of small samples. 

The Assumption of Normality. 

23.11. We have already considered one test of significance, that 
given by the distribution of ^ 2 . This is one of the simplest and most 
general tests known ; but the student will recall that it depends on the 
assumption that the theoretical distribution of cell frequencies in each cell 
is normal. This is justified under the conditions laid down in 22.18. 

In the tests which we shall now discuss we are similarly compelled to 
make some assumption about the nature of the parent universe, although 
we shall no longer be able to lay down analogous conditions on the arrange- 
meat of the data under which the assumption is justified. We shall 
specifically assume that the parent universe is normal unless otherwise 
stated. 

23.12. Our results will, therefore, be strictly true only for the normal 
universe. Some experiments have been made to throw light on the 
question whether they are true for other types of universe. It appears 
that, provided the divergence of the parent from normality is not too great, 
the results which are given below as true for normal universes are true to a 
large extent for other universes. But the whole situation is obscure, and 
it is to be hoped that in time investigators will be able to engage in the 
labour of a closer inquiry. In any case, if there is any good reason to 



488 


THEORY OF STATISTICS. 


suspect that the parent is markedly skew, e.g. U- or d-shaped, the methods 
of the succeeding sections cannot be applied with any confidence. 

23.18. We may direct attention to one further point on which caution 
is necessary. In the theory of large samples we recommended the student 
to base his conclusions on a range of six times the standard error, and 
pointed out that for normal universes the probability of deviations from 
the true value outside this range was less than 8 in 1000. One can feel 
great confidence in conclusions supported by probabilities of this order. 
But in the theory of small samples it is, as a rule, necessary to use larger 
probabilities, say, of one in 20 or one in 100, e.g . tlie 1 per cent, and 5 per 
cent, levels of P in the % 2 test. The force of inferences based on prob¬ 
abilities of this order is not so great as before, and the student should bear 
this fact in mind. 

23.14. For a known parent universe, and in particular for a normal 
parent, it is not difficult to find expressions for the random sampling 
distribution of the commoner parameters such as the mean and standard 
deviation. But these distributions, even when mathematically tractable, 
will in general contain certain parent values. For instance, the sampling 
distribution of the means of samples of n from a normal universe with 
mean m and standard deviation a is also normal with mean m and standard 


deviation 


V n 


In the cases which we wish to consider, n is not large 


enough for us to take estimates of rn and a from the sample to find the 
sampling distribution to any close degree of approximation. 

It is, however, a remarkable fact that we can construct certain para¬ 
meters whose sampling distributions are either independent of, or dependent 
on only one of, t he constants of t he parent. We will proceed to consider two 
important distributions of this kind, the so-called ^-distribution, due to 
“ Student,” and the ^-distribution, due to It. A. Fisher. 


The /-Distribution. 

23.15. Writing, as before, 

ic 1 S(x) 
n 

,<v- 1 _S(.r-.f)» 

n -1 


let us define a new parameter t by the equation 


t - J m Vv + 1 .... (23.4) 

,o\ ' 

where v — n - 1 and m is the mean of the universe. 

We shall refer to v as the number of degrees of freedom of t. 

Then it may be shown that, for samples of n from a normal population, 
the distribution of t is given by 




. (23.5) 



SAMPLING OF VARIABLES—SMALL SAMPLES. 


439 


23.16. We will imagine t/ 0 chosen so that the area of the curve given 
by equation (28.5) is unity. Then, precisely as for the x 2 distribution, 
the probability P$ that, on random sampling, we shall get a value of t not 
greater than some value t 0 is the area of the curve to the left of the ordinate 
at the point t 0 . We may write this 


P s 



y 0 dt_ 



(23.6) 


Similarly, the probability that we get a value of t between the limits 
t x and t 2 is given by 

P, = f‘ J/0 - 7 tT .... (23.7) 

Jtls t 2 2 ~ 

+ J 


Form of “ Student’s ” Distribution. 


23.17. The curves given by equation (23.5) are easy to study. Clearly 
they are symmetrical about t -- 0, sim e only even powers of t appear in their 

equation. Further, since -- * 0 decreases as t increases, the curves will 

(l+ ) 

V v 

have a mode (coinciding, of course, with the mean) at t =0, and will tail off 
to infinity on each side. They will, in fact, be symmetrical single-humped 
curves rather like the normal curve, only more lcptokurtie. 


As v tends to infinity, - ~ 4 A 

(*o lj 


tends to e 2 , and hence / is distributed 


normally. This fact enables us to use the tables of the normal integral to 
evaluate P approximately when v is large. 

23.18. At the end of this book we reproduce by permission tables of 
the integral (23.0) calculated by 4h Student ” himself (Appendix Table 5). 
These ha\ e been reduceel to three places of decimals horn the original four. 

Tables of rather a different form have been gi\en m Tables for 
Statisticians and Bio metricians, Pa?t If and by H. A. Fisher, and to 
avoid possible confusion w t c point out where these tables differ. 

“ Tables jar Statisticians , rfc.f gives the values of 


f e ° 

H- 


y 0 d~ _ 

Lt 1 

(1 + 3 2 ) 2 


where s = - for v from 1 to 9. These values (which were also calcu- 
V v 


lated by “ Student ”) are of the same kind as, but more limited in range 
than, those of our tabic. 

R. A. Fisher, in his “ Statistical Methods for Research Workers f adopts 
the standpoint we have already noticed in discussing the x 2 distribution 



THEORY OF STATISTICS. 


440 

(Chapter 22), and gives values of t. corresponding to various values of v 
and the 5 per cent, and 1 per cent, levels of a third probability P*. 

P s and P* are simply related. 7\ is the probability that an observed 
value will not exceed t 0 . P r is the probability that an observed value of t , 
regardless of sign , will exceed t 0 . 

Hence, 

P,s = Area of curve to the left of ordinate t 0 
Pj, - Area to right of / 0 + area to left of -t 0 

=^2 (Area to right of f 0 ) (since the curve is symmetrical) 

= 2 (1 -P,s).(28.8) 

The student should keep these relations in mind, particularly when 
thinking of levels of significance. In Fisher's sense a value of P, will fall 
below the 5 per cent, level if P,. is less than 0*05. This implies that P s is 
greater than 0*975, not 0*95 . 1 

Applications of “ Student’s ” Distribution. 

23.19. We proceed to give one or two examples of the way in which 
the u Student ,J distribution is generally used to test the significance of 
various results obtained from small samples. 

Example 23.1 .—Ten individuals are chosen at random from a popula¬ 
tion and their heights arc found to be, in inches, 08, 08, 00, 07, 08, 09, 
70, 70, 71 and 71. In the light of these data, to discuss the suggestion 
that the mean height in the universe is GO inches. 


In the first place, let us note that the universe is likely to be 
approximately normal, from our knowledge of height distributions, and 
the sampling is random. 

In the sample we find that 


and 


x ^67*8 inches 
r <T s = 8*011 inches 


Let us now calculate t from equation (23.4), taking m to be 66 inches. 
We have: 


67*8 - 66 
"“ 8*011 


VlO 1-89 


From the Appendix Table 5 (column r~9): 


Hence, 


for t -1*8, P —0*947 

for t 1*9, P-0-955 

for t - 1 *89, P-0*954 


1 A comparison of the tables is not made unv easier by the fact that “Student” 
and Fisher use 71 to denote the degrees < vf ‘ freedom, \\ herras “ Tables for Statisticians ” uses 
it to denote the number m the sample. We noted the same contliet in the p tables. 
We hope here that the use of a separate symbol v will remove a good deal of the 
confusion. 

The distinction between l\ and P h did not arise in Chapter 22 because y a is essentially 
positive. 



f 

SAMPLING OF VARIABLES—SMALL SAMPLES. 441 

Thus the chance of getting a value of t greater than that observed is 
X - 0*954, ix. 0*046, or about one in twenty. The probability of getting t 
greater in absolute value is 0*092, or about one in ten. We should hardly 
regard this as significant; but if we did, we should argue that as the 
observed value of t is improbable, the initial assumptions on which we 
obtained it were incorrect; and this in turn suggests that there is some 
doubt about the true mean being 66 inches. 

Example 23.2. —(Voelcker’s data quoted by “ Student,” Biometrika , 
vol. 6, 1908-9, p. 19.) 

Voeleker grew certain crops of potatoes dressed (a) with sulphate of 
potash, and (b) with kainite. In four experiments, two of each of 1904 
and 1905, the differences m yields per acre (sulphate plot less kainite 
plot) were: 

0-5161 ton 
0*3013 „ 

1*5211 „ 

0*0780 „ 


This suggests that sulphate of potash is a better manure than kainite. 
Required to discuss the question. 


From our knowledge of crop yields we expect them to be distributed 
m a single-humped form not very fai remo\ed from the normal. Let us 
suppose that the two manures ha\e the same effect on yield. Then the 
diffcicnees of plots will lx* distributed in an approximately normal form 
about zero mean. 

The mean oi the loin diffemuts is 0*7026 ton, and we find c a s =0*5312. 
lienee, 


0*7626 - 0 
0*5312 


VZ 


= 2-871 


From the tables, for v ~ 3, P -0*908 approximately. 

Hence the chance P of getting a value of t greater than that observed 
is about 1 in 83. The chance of getting a \alue greater absolutely than 
the observed value is 0*00. if we choose to regard this as significant, 
we arc led to suspect our hypothesis that the two manures exert equal 
influences on yield, and hence to suppose, though with little confidence 
so lar as these data arc concerned, that sulphate of potash is the better 
manure. 

23.20. The student who wishes to apply the ^-distribution for 
himself is advised to make a careful study of the logic of the argument 
underlying the inferences we have drawn in the foregoing two examples. 

In Example 23.1 we saw that the chance of getting a value of t less 
than 1*89 is approximately 0*951. This is not the same thing as saying 
that the probability of a deviation in the sample mean of 1*8 inches or 
less is 0*954. In fact, w r e do not know this probability, and the smallness 
of the sample prevents us from approximating to it with any closeness. 



442 THEORY OF STATISTICS. 

It might happen that cr in the universe was such that a deviation of 
1*3 inches was not at all improbable. The relative improbability of t 
would then be due to deviations of C cr s from cr n . 


Comparison of Two Samples. 

23.21. Suppose we have two samples x l9 x %9 
Let us, as before, define 


x x =-is(^) 

n i 

S{x') 



x ni and x X9 


(23.9) 


Let us further define 

-« 2 ) a } . . (23.10) 

If the two samples come from the same uiu\ ersc, r o«, 2 will be an estimate 
of cr w 2 . It has, as we imglit expect, n l +ric >-2 degrees oi freedom, since 
both x x and x 2 are calculated from the data. 

Let us write 

v ~ n x -f- n 2 — 2 .... (23.11) 

and define 



C^s \ 


I 

- + 
n l 


1 

«2 



V 


n x n 2 
n x + // 2 * 


(23.12) 


Then it may be shown that t 9 as so defined, is distributed according to 
the form of equation (23.5) with v degrees of licedorn. 

Example 23.3 .—(Data from R. A. Fisher, Matron, vol. 5, 1925, p. 95.) 

Eight pots growing three barley plants each were exposed to a high 
tension discharge, while nine similar pots were enclosed m an earthed 
wire cage. The numbers of tillers m each pot were as follows :— 

Caged . . .17, 27, 18, 25, 27, 29, 27, 23, 17 

Electrified . . 16, 10, 20, 16, 20, 17, 15, 21 

We are interested in the question whether electrification exercises any 
real effect on the tillering. 






I 

SAMPLING OF VARIABLES—SMALL SAMPLES* 
We find 

^ =23*333 x 2 = 17*625 

tCj JTn “ 5‘70S 

^- ^221 -875 -14-7016 c<7s =3-846 


•708 / i 

•*ur. V 


3-846 
v = 8 +9 


= 15 


448 


From the tables we find that P s =- 0-996, 

Hence, if the samples came from the same universe, they furnish a 
value of t which is improbable-- an absolutely greater value would arise 
only 8 times in a thousand. We therefore suspect that the universes are 
different, i.e. that electrilieation does exert some effect on the tillering. 

23.22. In applying the /-distribution to tw r o samples as in the preceding 
example one further point should be borne m mind. It does not follow 
from a significant value of / that the samples come from universes which 
have different means. Samples from two universes with the same means 
and different standard deviations would also furnish significant Z’s on 
occasion. We can test whether this is so by the method of 23.24 below. 

Significance of Regression Coefficients. 

23.23. R. A. Fisher has shown that the “ Student ” distribution can 
be applied to test the significance of regression coefficients and also of 
certain curvilinear regressions. We have not the space here to give a dis¬ 
cussion of these results, but the reader is referred to ref. (536) for further 
particulars. A test of the significance of correlation coefficients is given 
below (23.34 to 23.39). 


Fisher’s ^-Distribution. 

23.24. Suppose that avc haA e two samples, as in 23.21, with estimated 
variances c of and c Og as defined m equation (23.9). 

Put 


and write 



(23.18) 



(23.14) 


so that Vj and v t are the degrees of freedom of the estimates and ( crf . 

1 2 

Then R. A. Fisher has shoAvn that, if the samples come from the same 
universe and that universe is normal, z is distributed according to the law r 


y~ih 


■*, f 11 


. (28.15) 



THEORY OF STATISTICS. 


AA A. 

As usual, we take y 0 so that the area of the curve is unity, and the 
probability that we get a given value z 0 or greater on random sampling 
will be given by the area to the right of the ordinate at z$. 

23.25. This probability is not easy to tabulate owing to the fact that 
it depends upon the two numbers v x and r 2 . Fisher has therefore pre¬ 
pared tables showing the 5 per cent, and 1 per cent, significance points of z 9 
and a further table of the 01 per cent, points has been given by Colcord and 
Deming. These tables are reproduced by permission in Appendix Tables 
6A, 6B and 6C. For practical purposes they are sufficient to enable the 
significance of an observed value of z to be gauged. If the exact value of 
the probability of obtaining a given value of z or greater is required, use 
may sometimes be made of the tables of the incomplete beta-function 
(ref. (600)). 

Example 23.4. —Consider again the data of Example 23.3. 

Here, as always, it is convenient to take the sullix 1 to refer to the 
larger of the two estimates of variance. 

We ha\ e: 




Z -- 1 log, 


23 

5-1107 


-0-721 


v x - 8, v 2 7 

From Appendix Table 6A we sec that for these degrees of freedom the 
5 per cent, significance value of z is 0-6576. From Table 6B the 1 per cent, 
value is 0-9614. 

The observed z lies between these two and is thus of rather doubtful 
significance. ^ 

The Analysis of Variance. 

23.26. This is the name given to a process now frequently applied, 
mainly in agricultural experiments. For a full treatment we must refer 
the reader to those works dealing with the latter subject; here we will do 
no more than attempt to explain the general principles of the method. 

Suppose we have n varieties of barley and desire to determine whether 
they differ significantly m yield per acre. It would be no good growing 
just one plot of each and comparing yields, for soil is very variable and wc 
should have no idea whether any observed differences in yield were due to 
differences in variety or to differences in soil or some other such factor. 

Let us then grow k jilots of the same size for each variety. We shall 
then have data to determine the standard error of the mean yield for each 
variety and so the standard error of each difference of mean yield. But 
the process may be simplified. If we scatter the plots well in amongst one 
another, preferably at random, we may expect that fluctuations in soil 
from plot to plot will affect all varieties to about the same extent, and 



f 


SAMPLING OF VAKIABLES—SMALL SAMPLES. 445 


consequently the standard deviations of the varieties will not differ signi¬ 
ficantly owing to soil influences. 

Let <r v cr 2 , . . . cTp, . . . <r n be the standard deviations of the yields 
of the several varieties and 


JM 

n 


(28.16) 


Supposing for simplicity that n is large enough for us to be able to ignore 
the correction of equation (23.8), we may, on the hypothesis that the 
yields of different varieties arc equal, take cr v 2 to be an estimate of the value 
of the variance of a variety. 

Also, if x be the general mean of all yields and ,f 2 , . * . x v , . . . f n 
be the means of the several varieties, the variance of the means is given by 


..’-SyW’’ ■ • ■ (SB.1T) 

a , 2 

Now the variance of the distribution of means of samples of k is 

rC 

Hence, if 

f* ® 

w ’” ' />■ 


or 

k(T„, 2 (?v 2 .... (28.18) 

significantly, we may take it that the varieties do drifti significantly in 
yield. 

23.27. If < 7 ,, 2 be the variance of yield of fill the plots taken together 
without regard to variety, we have a simple relation between a y 2 , a vl 2 
and a,, 2 . 

In fact, for any one \ ariety, the sum of squares of deviations from the 
general mean is 

k{o,? f (,?„-J 5 ) 2 } 

and hence, summing for all varieties and dividing by nlc, we have: 

, <V 2 .... (28.19) 

In this way we have analysed the vatiance of the total into two com¬ 
ponents, the variance of the means and the variance within the varieties. 

23.28. It is convenient to arrange the results we have just obtained 
in the form of a table. The student will have no difficulty in recognising 
that, although wc have talked of plots of barley to lix the ideas, similar 
analysis applies to any data in which we have n classes each of k members. 

Since we want finally to compare cr t , 2 with ka m 2 , and not with o- m 2 , it 
will be more convenient to put kv m 2 rather than a m 2 itself in a summary 
table (Table 23.1, page 416). 

In the second sum of column 8, the summation is understood to relate 
to the squares of deviations of individuals from the mean of classes in 

r»*nfr 

which they occur, i.e. S (^ r ~x v ) 2 is an abbreviation for 

r~ i 


p<**n[r i 

S { S {x rv -$ v Y 

j 


®rp being the rth member of the pth class. 



446 


THEORY OF STATISTICS. 


Tabli 28.1. 


1. 

2 

3 

4 

5. 

Sums relating to Variation 

l)i\ isor 

bums 

Quotients 



p* » 



Between class mums 

it 

IS (r, -ry 

L(7 m - 


: 

' 

: 


% 1 

r^nk 



Within classes 

ul 

1 8 (r r - x t ) 2 





! r 1 





r— nA 



Total 

nk 

s (D-r)- 

<*V* 



! 

r— l 




As a check, we note that the first two items in column 3 must add up 
to the third. In actual practice it is customary to use this fact to deduce 
the second from the other two, and not work them out independently. 

23.29. Let us take the following data as an illustration— an illustra¬ 
tion only, for (1) n is not large, and (2) the data are a mere extiact from an 
experiment on a much larger scale with 18, not 6, plots to each variety. 

Tabu 23 2 - \ icld of Gram in grammes on Plots of Barley of Out Squat e Yard , thirt 
bang Five Vanctu s and Sic Plots of hath. (Data quoted by Engkdow and Yule, 
“ The Principles and Puictuc of 1 teld Trials," 1020 ) 

(The tabular arrangement does not, of (ourse, represent the physical 
lay-out of the plots ) 


Plot 

Number 



Variety 

3 

4 

_i 

Mean 

1 

387 

372 

350 

3 W) 

398 

369 4 

2 

420 

455 

417 

300 

358 

402 0 

3 | 

353 

375 

400 

| 358 

334 

364 0 

4 

331 

328 ! 

, 325 

370 

340 

338 8 

5 

358 

383 1 

1 378 

395 

320 

366 8 

6 

400 

308 I 

1 27"> 

375 

430 

357*6 

Mean 

374 8 

370 2 

357 5 

306 3 

303 3 

360 4 


The mean of the whole, ®, is 366*4. The sums of squares of deviations 
from this mean may be found m the usual way, and the calculation 
simplified by taking a working mean at, say, 866. 




SAMPLING OF VARIABLES—SMALL SAMPLES. 


447 


We find, to the nearest unit, 

S (,r r -cc) 2 =48,934 

r-1 

Similarly, 

1,048 

pc 1 

Hence the table of the analysis is as follows • - 


Table 23 3. 


L 

2 

3 

4 

6. 

hums relating to Van ition 

DlVlROT 

hums 

Quotients 

Between class means 

1 r 1 

r ) 1 

1*043 

ka t * 

■= 209 

Within c lassc s 

10 

42,S91 

O’,** 

— 1,430 

Total 

30 

4 1,034 

CT 8 

- 1,464 


We see that ct, 2 is very mueh guaki than her , 2 , and the magnitude of 
the diffiKiut suggests that it is dm to some nal cause 

We should piobabh inter that, since the vanabihtv within a variety 
is greater than that, between means ot \ ant ties, no sigmh(‘anee can be 
attached to differences between the latter 

23.30. But the puxtss of the previous section is not verv aeeutatc 
witli samples so small as those with which we have been dealing The 
corrected variances, based on degices of freedom, not the number of 
observations, should be used (r/. equation (23 3)). This gives a more 
complex appearance to lhe arithmetic, but the primiples arc similar The 
student will probably find the determination of the degices of freedom his 
pnneipal difficulty. 

There are n class means, so that the number of degrees of ficedom in 
the variance between class means is n~3. Time are A membeis in each 
class (decrees of ficedom /> -1), and n classes, total w(A -1) degrees of 
ficedom in the v aminec within vane ties. For all classes together there 
are nk observations and hence nk -1 degrees of freedom. 

But 

(nk ~ 1) - (n -1) + n(k - X) 

and hence the degrees of freedom cheek by addition m the same way as 
the sums of squares 

23.31. Our general table now takes the form of Table 23.4, page 448, 
where we have used the symbols r cr m 2 , c a v 2 , ( or y 2 to denote the variances 
corrected as in equation (23.3). 

The student should note that these corrected vananees arc not additive. 
Nevertheless, it is common to refci to a process of analysis such as this 
as the “analysis of variance.” Strictly speaking, perhaps, this is a 
misnomer* It is only the sum of column 3 which is analysed into com¬ 
ponent sums. 



418 


THEORY OF STATISTICS. 


Table 28.4* 


1. 

2. 

3. 

4. | 5. 

Bams relating to Variation. 

Divisor 
(Degrees of 
Freedom) 

Bums of 
Squares. 

Quotients. 

Between elans means 

n - 1 

/'=» 

kS <•> t> -j) 2 

j #>- 1 

k c aj | 

j 

Within classes . . . ] 

//(*-!) 

. r — kn 

i s 0 r r 

j r _ 

r 0> 2 j 

1 

Total 

nk~ 1 

r - kn 

s (Jv-i)* 

r=-*l 

1 

,a* ! 

1 l 

1 ____ 

___ 


1 


23.32. In small samples the significance of the difference of h\cr in * 
and ( .a v 2 can be ascertained by the z test, the appropriate degrees of 
freedom being those of column 2. 

In fact, if the classes exercise no effect on the variate values of their 
members, so that the nk members can be regarded as a homogeneous 
set grouped at random into n classes, k c a m 2 and c a v 2 will be estimates 
of the variance in the universe. Further, if the parent universe is normal 
those estimates will be independent, for errors of estimation in the means 
of classes will be independent of errors m the variances within classes. 1 
All the conditions for the application of the z test therefore obtain. If 
the test reveals no significance in the difference between k ( a m 2 and r o- t , 2 , 
we conclude that, so far as this approach shows, the ( lass does not exert 
any distinguishing effect on its members. If, on the other hand, the 
difference is shown to be significant, the <*las£ does exert some influence. 

Two cases may arise, according as k v a ni 2 is less than, or greater than, 
c a v 2 . It may be shown that these cases correspond to the existence of 
positive or negative intraelass correlation (13.2 ( )). 

23 . 33 . Table 23,8, with corrected variances, now becomes : 


Table 23.5. 


■' ■ ~ - - — 




\ 

1 . 

2. 

3. 

* 

5. 

_ 

Sums relating to Variation. 

Degrees of 
Freedom. 

Bums of 
Squares. 

Quotients. 

Between class means 

4 

1,043 


261 

Within classes . 

25 

42,891 

rOt, 2 

1,716 

Total 

29 

43,934 

(O 2 

1,615 


1 We proved on page 405 that for large samples errors in the mean and s.d, are 
uncorrelated in a symmetrical universe. It may be shown generally that for samples 
of any size from a normal universe the errors are independent , 





SAMPLING OF VARIABLES—SMALL SAMPLES* 


449 


We see at onee that since the corrected variance within varieties is 
greater than that between varieties, any intraclass correlation must be 
negative* To test its significance we have: 


v x — 25 i/ 2 — 4 




1716 

261" 


= 0*042 


From the Appendix Tables we see that the 5 per cent, point is about 
0*876 and the 1 per cent, point 1*81. The result thus is barely significant. 

It is instructive to note that the correction of the variances happens 
in tliis ease to give an absurd result, such as was noted in 23.7 might 
occur; the variance within classes is made to appear greater than the 
total variance, which is impossible. 

Correlation Coefficient in Small Samples. 

23.34. Although the distribution of the correlation coefficient in 
samples from a bivariate normal universe tends to the normal form as 
the size of the sample increases, a fact which justifies the use of the 
standard error for large n, the distribution diverges very remarkably 
from the normal when n is small, and even when n is moderately large 
if the correlation in the parent universe is high. Further investigation 
is therefore necessary before we can assess the significance of correlation 
coefficients obtained from small samples. 

23.35. T1 r* distribution of the correlation coefficient in samples 
from a bivariate normal universe was obtained in an exact form by 
It. A. Fisher in 1015. Ordinates of the frequency-curves which give the 
distribution have been worked out for various values of n and p, the 
correlation m the universe, and are tabulated m “Tables for Statisticians 
and Hiomehieians , Part and more iully in ref. (577). The general form 
of these curves is illustrated in fig. 28.1, which shows the curves for p = +0*6 
and various values of n. 

A glance at this figure will show that even for a moderate value of p , 
such as -f 0*6, the distribution of the coefficient is U-shaped for n^-3, 
and, although single-humped, distinctly skew to the eye even for n= 20. 
For high values of p, such as +0*9, the distribution is skew for higher 
values of n. 

As a result it is safe to say that the values of correlation coefficients 
calculated from samples of less than live wall throw no light on the exist¬ 
ence of correlation in the universe. For samples of 20 or 80 we cannot 
apply the standard error with much confidence if the correlation in the 
universe is likely to be very high, whether positive or negative. 50 
seems to be the minimum number in the sample for the application of 
the standard error if p is very high, and 100 is safer. 

23.36. The equation giving the distribution of the correlation coefficient 
is very complex, but tables have been prepared showing the areas under 
the frequency curves for various values of n 9 p and r. 1 These tables may 

1 F. N. David, Tables of the Correlation Coefficient , 1938, Biornctrika Office, Univer¬ 
sity College, London. 


29 



450 THEORY OF STATISTICS* 

be used to assess the significance of an observed value of r from a bivariate 
normal universe. For most practical purposes, however, use may be 
made of a method due to JR. A. Fisher, the essence of which is the trans¬ 
formation of the distribution of r into a new distribution which is approxi¬ 
mately normal. 



Fig. 28 . 1 . - Frequency Distribution of the Correlation Coefficient in Samples from a 
Normal Universe with Correlation -4 0*0 for Various Values of the Number in the 
Sample n. (In each case the total frequency, i.c. the area under the curve, is 
unity.) 


23.37. Before we discuss this process, however, it is desirable to 
point out the degree of applicability of our results, 

(1) In the first place, it has been shown that the distribution of partial 
correlation coefficients in samples of n is of tlie same form as fhat of total 
correlation coefficients in samples of n ~ p , where p is the number of 
secondary subscripts in the partial coefficient. 

(2) Secondly, our results are strictly true only for normal universes. 
There is some experimental evidence to show that they are true for all 
practical purposes even if the parent is moderately skew but remains of 
the single-humped type ; but if there is any reason to suppose that the 
parent is J- or U-shaped according to one or more variates, the student 
should draw his conclusions with the utmost reserve. 


SAMPLING OF VARIABLES—SMALL SAMPLES. 


451 


Fisher’s Transformation. 

23.38. If r and p are the correlations in the sample and the universe 
respectively, let us put 

r~tanhs p-tauh£ 


So that 


! r n 


(28.20) 


Then it may be shown tliat z is, to a close approximation, distributed 

normally about mean £ with standard deviation I 

V 71 ~ & 


In fact, the* mean of z is given by 




(«-]) 


+ terms in 


(n~l 


etc. 


(23.21) 


and, for the 2 -distribution, about the mean 

ft =- (w P _ ! )S (p 2 - &) + term! > »“ (n I j jr etc - • (23.22) 

3 + W(n~ P l) + tCrmS iH (V- 1 !>’ etC ‘ 1 • (2S - 28) 


For n - 11, sav. ^ is of the order of 0 001 c\en if p is high, which shows 
how closely the s-distribution lies to the symmetrical ; and /3 2 ~3 is of the 
order of 0*2, which show s that the distribution has nearly normal kurtosis. 
In such a ease z would differ from £ by 0*05, which is not large, but might 

be important in some eases. The standard error of z is, however, , } , 

Vn- 3 


and the factor ^ ^ may, as a rule, be neglected in comparison. This is 

the basis of the statement above that z is normally distributed about 
mean £. 

We now give some examples of the use of the ^-transformation in 
testing the significance of an observed r. 


Example 23.5 .—In Example 11.1, page 215, we found that the correla¬ 
tion between the price indices of animal feeding-stuffs and home-grown 
oats is 0*68, the sample consisting of (50 members. 

This sample is large enough for us to use the standard error. If wc do 
so we get 

a r ~™ ^5---* -0*07 approximately 

The correlation thus is undoubtedly significant. 

* This z is to be distinguished from the z of Fisher's distribution of 23.24. 



452 


THEOKY OE STATISTICS. 


We might, alternatively, use the z test, thus, to answer the question, 
“ Could the observed value have arisen from an uneorrelated universe ? ” 
On this hypothesis 


We have: 


p = 0 and £ — 0 


z-i log. 


1*68 

0*32 


= 0*829 


The standard error of z is =0*13. 

V 57 

The deviation of z from £ is more than six times this, and we conclude 
that our hypothesis was ineorrect, i.e . that the universe is correlated. 

Example 23.0 .—Continuing the previous example, could the observed 
correlation have arisen from a universe in which p = + 0*8 ? 

Here 

£=41og f = 1-099 

1 - p 

The deviation of z from £ is, therefore, 

1 099 -0*829=0-270 


This is about twice the standard error of z. It might arise, though 
rarely, as a sampling fluctuation, and we conclude that p is likely to be less 
than + 0 * 8 . 

Example 23.7 .—In Example 14.1, page 270, we found a partial correla¬ 
tion of -0*73 (38 unions) between earnings of agricultural labourers and 
the percentage of the population in receipt of relief, when the ratio of 
numbers in receipt of outdoor relief to those relieved in t he workhouse was 
constant. Is this significant, and can it have arisen from a universe in 
which the real correlation is - 0*007 ? 


Here 


z = 4 log. 


0*27 

1*73 


= -0*929 

£ for an uneorrelated universe =0 


£, 


if p = - 0*667 = \ log. 


0*888 

1-667 


= -0*805 


There is one secondary subscript in the partial correlation. Hence, the 

standard error of z = - 7 ^“^== = 0*1715. 

v 38 -1 -8 



SAMPLING OF VARIABLES—SMALL SAMPLES* 


453 

If £»0, the deviation is more than five times the standard error and 
is undoubtedly significant. If -0*667, the deviation is less than the 
standard error and hence may very well have arisen from sampling 
fluctuations 


Application of “ Student’s ” Distribution to Correlation Coefficients* 

23.39. The test we have just given is of general application, but it 
is worth noticing that if p =0, t he distribution of the correlation coefficient 
in small samples from a normal universe may be tested by the “ Student ” 
distribution. 

In fact, the distribution of the correlation coefficient assumes a par¬ 
ticularly simple form for such uncorrelated universes, namely, 

n - 4 

V-0o(l -e)"*" .... (28.24) 

If we put 

• ( 28 - 25 ) 

then it may be shown that t is distributed in the “ Student ” form with 
n *~2 degrees of freedom, and its significance may be tested accordingly. 


Significance of the Correlation Ratio. 

23.40. The distribution of if 2 m samples from an uncorrelated normal 
universe may be derived from Fisher’s ^-distribution. lienee we may test 
whether an observed value of rj 2 is significant of the existence of correlation 
in the parent, assumed normal or approximately so. 

When considering the correlation ratio in 13.6 we saw that for (he 
arrays of 

2 2.2 
® ax • ®mx r 

where 

<y\ is the variance of the whole 
af u is the variance within arrays 
o 2 mr is the variance of array means 


If there are p arrays and n p is the number of members in the pth array, 
we may write this : 


S(® -#) 2 =-S(,r -x v ) 2 + S{n v (x p -®) 2 } . . (23.26) 

Now let us regard the arrays as classes, and the items of the arrays 
as class-members. Equation (23.26) is then an analysis of the sums of 
squares of the type which we have studied in the analysis of variance. 
The numbers n {) are not constant in each class, as was k\ but this makes no 
material difference, and we may apply the results of 23.30 to 23.33. 

Using the corrected variances, we may write the analysis in the following 
tabular form. 



454 


THEORY OF STATISTICS. 


Table 23.0. 


1. 

2. 

3. 

4. 

5. 

Sums relating to Variation. 

Divisor 
(Degrees of 
Freedom). 

Sums of 
Squares. 

Quotients. 

Between class mean*? . 

i 

p~l 

p^p 

S i n v (i v -x)*' 
/»-1 


N °l r hv 

i z i 

i 

Within classes . 

N-p 

r—A* 

s (x r x v y 

r=l 


Nal(\ -nl„) 

N -p 

Total 

N- 1 

r—N 

S (r,-.r)* 

t- - L 




In column 5 wo have anticipated results which are easily proved as 
follows :— 

By definition, 

S(,r-,f ) 2 -Nol 

Hence, S{n p (f„ -,f ) 2 } - JVo*i£, 

Dividing the sums of squares by the appropriate number of degrees of 
freedom, we get the results of column 5. 

Now, if the universe is normal and uncorrelated, the two items 
in column 5 arc not significantly different ; for they are independent 
estimates of the variance of x in the universe, all arrays having the same 
mean and standard deviation. 1 We may test the significance of their 
difference by the ^-distribution. We have: 


« ^ I log, 
= 2 log. 


2V<t,V INo/(l - 
p -1 / N - p 
r/ 2 N -p 
I - T p ~ 1 


V 2 ) 


V, -p- 1 ) 

v-i =- a ' - pi 


(23.27) 

(28.28) 


In equation (28.27) wc have omitted the suffix x // in writing r; 2 . 
Clearly a similar test may be applied to rf vxi p in this case referring to the 
number of* //-arrays. 

23.41. From the relation (28.27) between z and rj 2 it may be shown 
that the distribution of r/ 2 , corresponding to that of z given by equation 
(28.15), is 

2 (f” 7 ? 2 ) 2 (28.29) 


‘ Strictly speaking, this is only approximately true of arrays of finite width. If the 
ranges defining the arrays are very broad, the test must be used with reserve. 



SAMPLING OP VARIABLES—SMALL SAMPLES. 


455 


It will be seen that this involves the number p, i.e. depends on the 
number of arrays into which the data are grouped. This fact is important, 

1 — T) 2 

and reveals that the use of the standard error given in 21.27, can 

be no more than an approximation at the best; for that formula does not 
contain p . 


23.42. The tables of the significance points of 2 are designed mainly 
for small samples. If the data are grouped, as they must be for the 
calculation of rj 3 to be possible, at least one of v X9 v 2 is likely to be large. 
In such cases, however, interpolation will usually give results accurate 
enough for the purpose in view. But special tables have been prepared 
by T. L. Woo and appear in “ Tables for Statisticians and Biometricians f 
Part 2 ” to enable closer approximations to be made without arithmetical 
labour. 

23.43. It is interesting to note that, since rj 2 is positive, its mean 
value will not be zero. The mean value (which differs from the square of 
the mean value of t 7 ) is given by 


W=N~\ .... (23.30) 


Eorample 23.8 .—Let us consider the data of Table 11.3 (correlation 
between stature of father and stature of son), in which r) yy -y] vx -0*52. 
We know that the distribution is approximately normal, a fact which is 
borne out by the approximate equality of the two correlation ratios, and 
hence we may apply the foregoing theory with considerable confidence. 

We have, for r; ya ,: 


v x ^p~ 1 -16 

i' a =jV ~~p -1078 -17-1061 


2 = | log. 


(0-52)* 

1 ~ (0-52) 2 


1061 

16 


— 1-60 


From Appendix Table CC we see that the 0*1 per cent, significance 
points are as follows :— 

Vj—12 v l =24 

c 2 -00 0*5992 0* 1955 

v 2 «00 0*5044 0*3780 

The observed z is therefore very strongly significant of correlation in 
the universe. 


Test of Linearity of Regression. 

23.44. In 13.7 we saw that the regression of y on oc was linear if, and 
only if, rjl x -r 2 =^0. An important question to decide is, therefore, can 
an observed value of y 2 ~r 2 have arisen from a universe in which the 
regression is linear, i.e. the true value is zero ? 

This question can be decided by the 2 test in a similar manner to that 
of 23.40 and 23.41. We consider the analysis of the sums of squares of 
deviations from the regression line into two parts : ( 1 ) deviations within 
arrays, and ( 2 ) deviations of means of arrays from the regression line. In 



456 THEOHY OF STATISTICS. 


this way it may be shown that the linearity may be tested by taking 


2 = | log. 


rj 2 -r 2 N -p 
1 ‘ p ~~2 


vi = p~ 2l 

v% = N-pf 


. (28.31) 

. (28.82) 


Example 23.9 .—In considering the correlation between old age, 
pauperism (x) and the proportion of out-relief (y), Yule found (“ Economic 
Journal vol. 6, 1896, p. 613) 

N -235 
r = + 0-34 
rj xv = 0-46 

Vvx ~ 0-39 

for a grouping of 19 tc-arrays and 8 j/-arrays. Can the regressions be 
supposed linear ? 

For the ^-arrays, N -p =216, p - 2-17 


r f -r 2 _(0-46)*-(0-34)* 
\-y 2 ~' 1 - (0-46)* 


0-12177 


t^iog^o-wmx 2 *®) 


= 0*218 


The 5 per cent, point for tq — 17, v 2 — co, is about 0*25, and there is thus 
no reason to suppose from the observed z that the regression is not linear. 
For the ^/-arrays, similarly, p ~2 = 6. 


! = i log, ( 

= 0-244 


(0-39) 2 -(0-34) 2 
1 - (0-39) 2 


227’ 

V 


This also will be found to lie within the sampling limits, and the test 
therefore does not reject the linearity of either regression. 


Significance of the Multiple Correlation Coefficient. 

23.45, The multiple correlation coefficient is in many ways analogous 
to the correlation ratio, and we may test its significance by a procedure 
very similar to that used for the significance of the correlation ratio and 
regressions. 

Consider the regression equation with p variates, 

(Ti-V 2 + J 3 (r 3 + . . . + bjfc v 

the variates being measured from their means. 

We may regard the deviations of observed v alues of x x as composed of 
two parts : (1) deviations from the values of or x given by the regression 
equation, and (2) deviations of the latter from the mean of oc v The sura 
of squares can be analysed accordingly. 



SAMPLING OF VARIABLES—SMALL SAMPLES, 457 


The sum of squares of deviations of observed values of x x from the 
mean of x x jVct by definition, and has N -1 degrees of freedom. 

The sum of squares of deviations of observed x x s from the regression 
values is Nal 2 . . . P which, by the definition of R l(2 is equal to 

iVa J 2 (l - /?*(*. „)). This has N -p degrees of freedom, for c* 2 
has A r ~ 1 degrees of freedom, aj 2 has N - 2 degrees, and so on. Writing 
R for R X { 2 we may express the analysis in the following tabular 

form:— 

Table 23.7. 


1. 

2. 

3. 

4. 

6. 

Bums relating to Variation. 

Decrees of 
Freedom. 

Bums of 
Squares. 

Quotients. 

Between class means 
(Regression values from 
mean.) 

P 1 

WNoP 


U% M 2 

. N(J t 

P~ 1 

1 - Jt* 

Within ehisses . 

(Deviations tiom logies 
sion \ alut h ) 

Tot al 

N p 

N~ 1 

(1 -R*)Na P 

1 

Nap 


4p- - * N ap 
N -p * 


Now if the uni\ (rsc \ alue of R is zero, the corrected variances of column 5 
should not differ significantly; for x x and b%r 2 + . . . +bjj£ v are then 
uncorrelatcd, and hence deviations of x from the regression values are 
uncom Kited with, and independent of, deviations of the regression values 
from tin* mean, the universe bung normal. 

Hence we maj test the significance of R by putting 


l 


log c 


R* 

1 -R 2 


N-p 
P -1 


v i = P~ 1 \ 
Vo = 2V ~p j 


(23.38) 

(23*84) 


It will be seen that equation (23.33) is of the same form as equation 
(23.27). The distributions of R 2 and r; 2 are formally identical, and we have, 
for instance, corresponding to equation (23.30), 


(* 3 )- 


p~l 

A-1 


(28.35) 


Example 23.70. —In Example 14.3, page 270, we found 7? 1(23) =0*74. 
Is this significant V 
We have: 


p =- 3, N - 38 
Vj - 2, v 2 ~ 35 



1*53 






458 


THEORY OF STATISTICS* 


For v x =2, the 0*1 per cent, significance points are: 

-= SO 1 *0859 

v 2 =U) 1 0552 

The observed z is well above these values and hence E is significant. 


SUMMARY. 

1. As an estimate of the mean of the universe we may take the mean 
of the sample, whether large or small. 

2. If the mean of the universe is known, we may take the mean 
square deviation about that mean as an estimate of the variance of the 
universe; i.e. the estimate is gi\cn bv 

a/ - 1 S (or - m ) 2 
n 

8. If the mean of the universe is not known, a preferable estimate of 
the universe variance is the “ corrected ” variance of the sample, given by 

4. This estimate is said to have n~ 1 degrees of freedom. 

5. In samples from a normal universe ttie parameter t> given by 


«/--VvTi 

,<r s 

where v^n -1, is distributed according to the law (due to “ Student ”) 




yo ___ 

^+i 



2 


This distribution may be used to give the probability of getting a value 
of t between specified limits on random sampling. 

6. With two samples, ,r h . . . <r ni and ce x > . . . y n ', from the same 
normal universe, the parameter t defined by 


where 


c a s ' n x \ w a 




and 


v^n t +n 2 -2 


is also distributed according to the above law, with v degrees of freedom. 

7. With two samples, as before, with estimated variances 





SAMPLING OF VARIABLES—SMALL SAMPLES. 


459 


the parameter s = Jlog e 

is distributed according to the law (due to R. A. Fisher) 

e vyZ 

y~Vo~' 

W+Vi) * 

where 

v x -= n x ~ 1, v 2 =n 2 -l 

As usual, this distribution may be used to give the probability of 
getting a value of z between specified limits on random sampling. 

8. If the data are arranged in n classes of k members each, the signifi¬ 
cance of differences between the classes may be tested by comparing 
ka m 2 with cr v 2 , where a m 2 is the variance of class means about the mean 
of the whole, and a v 2 is the average of the variances within classes. 

If the sample is small, the comparison may be carried out by applying 
the z test to the u corrected” variances C cr m 2 and ( a v 2 with n-1 and 
n(k -1) degrees of freedom respectively, the parent universe being assumed 
normal. 

9. The distribution of the correlation eoellicient in samples from a 
normal bivariate universe is not normal. However, putting 



\~r 
- r 


£ = ! log,. 


1 h P 
1 -p 


where p is the correlation in the universe, it may be shown that z is 
approximately normally distributed about '£ with standard deviation 

^ n being the number in the sample. 

10. This result remains true of partial correlation coefficients, but in 
the above formula' n must be taken to be the number in the sample less 
the number of secondary subscripts in the eoellicient tested. 

11. In samples from an uncorrelated normal universe the distribution 
of r is given by 

n - 4 

2/ — ?/o(l ~r 2 ) 2 


The parameter t , defined by 



V n - 2 


is distributed in the “ Student ” form in such cases with ft - 2 degrees of 
freedom. 

12. The significance of rj 2 from an uncorrelated normal population may 
be tested in Fisher’s distribution by putting 



460 


THEORY OF STATISTICS. 




rf 

-V 2 


N-jp 

p-1 


vi=p- 1, »'a -N -p 


where N is the total number in the sample and there are p arrays. 

18. The same formulae give a test for the multiple correlation coefficient 
R, from a normal universe, if R 1 2 3 4 5 be substituted for ?/ 2 , p being the total 
number of subscripts to R. 

14. The linearity of regression in a normal universe, as judged from 
the value of )f -r 2 , may similarly be tested in the 2 distribution by putting 


Jog, 


r/ 2 - r 2 N -p 

1 - 7) 2 p -2 


Vi~p-'2 

-p 


EXERCISES. 

23.1. Find “Student’s" t for the following variate values in a sample of 10: 

-(3, -4, - 8, -2, -2, 0, 1, 1, 8, 5, taking m to be zeio, and find from the tables 
the probability of getling a value of t us great or greater on random sampling 
from a normal uimerse. 

28.2. A farmer grows crops on two fields, A and B. On A he puts £1 worth 
of manure per aeie and on B £2 worth. The net returns per am, e\clusi\e of 
the cost ot manme, on the two fields in five ycais arc: 


Year. | Field A, £ per Acre. j Field B, £ pei Aeie 


1 

2 

3 

4 

5 

i 


17 1 

18 

14 

16 5 

21 ! 

24 

18 5 1 

19 

22 

25 


Other things being equal, discuss the question whether it is likely to pay the 
farmer to continue the more expensive dressing. State clearly the assumptions 
which you make. 

28.8. The heights of six randomly chosen sailors are, in inches: 63, 65, 68, 

69, 71 and 72. Those of ten randomly chosen soldiers arc : 01, 62, 65, 60, 69, 69, 

70, 71, 72 and 78. Discuss the light that these data throw on the suggestion 
that soldieis are, on the average, taller than sailors. 

28.4. In the data of Exercise 23.3, use the s-disti ibution to discuss whether 
the samples can have come from univeises which ate identical so far as height 
distribution is concerned. 

23.5. In three samples of 50 lines each from Shakespeare’s “Romeo and 
Juliet" (an early play), the following numbers of weak endings were observed: 
7, 9, 10. In three similar samples from “C\mbeline” (late), the numbers of 
weak endings were 15. 11, 12. Discuss the suggestion that Shakespeare’s 
prosody, as judged by the number of weak endings, changed with advancing 
years. 



SAMPLING OF VARIABLES—SMALL SAMPLES, 


461 


28.6. A random sample of 15 from a normal universe gives a correlation 
coefficient of - 0-5. Is this significant of the existence of correlation in the 
universe ? 

28.7. Show that in samples of four from an uncorrelated normal universe 
all values of the correlation coefficient are equally probable; arid that for 
samples of less than four a zero coefficient is the most improbable. 

28.8. What is the probability that a correlation coefficient of -t 0*75 or less 
can arise in a .sample of 80 from a normal universe in which the true correlation 
is -f 0*9? Compare this with the result given by assuming the sampling dis- 

1 -r 2 

tribution normal with standard deviation - -- • 

Vrt 

28.0. Test the significance of the paitial correlation coefficients of Example 
1 LI, page 270. 

28.10. Test the significance of the two multiple correlation coefficients of 
Example 1 1.8, pace 271), other than the one tested in Example 28.10. 

28.11. Show that in samples of 25 from an uncorrelated normal universe (he 
chance is 1 in 100 that r is greater than about* 0 48. 

28.12. Referring to Exorcise 18.1, test the linearity of the regressions of the 
distribution of cows in Table 11.4, page 200. 



CIJAPTER 24. 

INTERPOLATION AND GRADUATION. 


Simple Interpolation. 

24.1. If the value of a function of a single variable a?, say u& has 
been tabulated for equidistant values of the variable x, x + h, x+2h, etc., 
we often require to find the value of the function corresponding to an 
intermediate value of the variable. Functions in very general use, such 
as common logarithms, have* usually been tabulated with intervals so small 
that even over a range of several intervals the relation between u x and x 
may be assumed to be effectively linear, that is of the form 

Ujl ~ a o 4 a \ x .... (24.1) 

as is shown by the constancy of the differences between successive values 
of u. For example. 

Table 24 . 1 . 


Number. 

I 

Logarithm. 

30597 

4*4850788 

30598 

4*4836930 

30599 

4-4857072 

30600 

4*48572i4 

30601 

4*4857350 

30602 

4*4857498 


Difference {-4 ). 

0 0000142 
0 0000142 
0*0000142 
0*0000142 
0*0000142 


If we then require, say, the value of log 80600*3, it is sufficient to use the 
familiar process of simple interpolation : 

log 80600 4-4857214 

0-3x0-0000] 42 43 


4-4857257 


The little multiplication sum is, in most tables, already done for us in the 
margin. 

Differences. 

24.2. For any function which has been tabulated to sufficiently line 
intervals (within certain limitations) simple interpolation can be used in 

462 




INTERPOLATION AND GRADUATION 4 


468 


this way—it is only a question of making the intervals sufficiently small 
(see below, 24.16). But many functions have not been tabulated in such 
detail, successive differences are not equal, and consequently simple 
interpolation cannot give an accurate result. The problem then arises, 
how are we to interpolate with reasonable precision ? And the answer is 
given by proceeding to higher orders of differences , as they are termed; i.e. 
instead of considering only the differences 

^o 1 ~ u i ~ 

A] 1 

A„ l =M a -«2 

etc., we also consider the second differences 

A 2 -A i-A 1 

A^-A^-A/ 

A^-Aj 1 -A > 1 

etc., or even the third differences, fourth differences, etc. 

24.3. To take an actual example, Table 21.2 shows the squares of 
the first few natural numbers, together with their first and second differ¬ 
ences. Following a practice which is convenient for printing and for most 
purposes of practical work, each difference is printed, not on a line between 
the two figures to which it relates, as with the logarithms in Table 24.1 
above, but on the same line as the upper figure of the two concerned—the 
line of the ligure subtracted ; and as the signs of the differences arc 
constant for each column this sign is simply stated at the top. 

Table 24.2. 


Number. 

Square. 

Firbt i)iff. 

Second DifT, 

Thud Diff. 

.T. 


A> { H ) 

A*(+). 

A 8 . 

0 

0 

1 

2 

0 

] 

l J 

3 

1 2 

0 

2 

1 4 

5 

I 2 1 

0 

3 

1 y 

7 

0 

.— 

4 

16 | 

i) 

i 

— 

f> 1 

; 25 


i _' 

— 


Here we sec that the first differences - the only ones with which we 
have been concerned hitherto— are no longer constant ; but they follow a 
simple rule, in that they arc an arithmetic scries, a linear function of oc. 
As a result, the second differences are constant, actually +2, and con¬ 
sequently the third differences vanish. 

24.4. The figures on the first line of such a table are called the leading 
term (0) and the leading differences (+1, +2, 0), and it is evident 
that, given the leading term and the leading differences, the whole table 
could be built up by successive addition as far as wc pleased, without 
calculating any square directly except for checking. The series of first 




464 


THEORY OF STATISTICS. 


differences would be obtained by adding 2 over and over again, starting 
from the leading difference 1 , i.e . 1 + 2 = 0 , 0+2 = 5, etc. The squares 
would be given then by adding these differences in succession to the 
leading term 0 : 0 + 1=1; 1+8=4; 4+5=9, etc. 

Differences of a Polynomial. 

24.5. From these results we may conclude quite generally that the 
second differences of any polynomial of the second degree, 

u r ~o 0 -f a r r + 0 ^ .... (21.2) 

are constant and the third differences vanish. For, if we multiply all the 
squares in Table 24.2 by any factor a 2 , we merely multiply all the* differences 
of every order by the same factor; and the linear part of the function, 
« 0 +tf r r, cannot contribute to second differences. 

Below we give a similar table. Table 21.8, for the cuhis of the fust few 
natural numbers, and here it will be seen that third differences are constant 

T\m i 24.8. 


r 


Numtier. 

Cube. 

First Diff. 

Second DifT. 

Third Dill. 

Fourth Di(V 

X. 

u x . 

A'(-t). 

^(f). 

A 3 ( i ). 

A' 

0 

0 

1 

d 

6 

_ 

0 

1 

] 

7 

12 

6 

0 

2 

8 ! 

19 

18 ! 

6 

- 

3 

27 

37 

24 

— 


4 

64 

61 



- 

K | 

** 

1 

125 

-- 

! 

I 


1 

1 ( 


and fourth differences \amsh. By similar reasoning we may conclude that 
the third differences of any polynomial oi the thud degree, 

Uj =~tf 0 +a ± r » Onpo 2 + tfjir 8 . . . (21.8) 

are constant and the fourth differences \amsh. Tlu sludenl will be quite 
correct if he draws the general conclusion that tor a polynomial of the / Hi 
degree, 

t/^fl Q+a 1 x+a i x ,t + . . . ti// . . (21.1) 

the rth differences are constant and the (r-tl)th differences vanish. To 
prove this it is only necessary to note that each successive differencing 
lowers the degree of a polynomial by unity, for the difference of any turn 
x k is 

(a? + l ) 7 -x l =*kw*" 1 t 1 lr /f ~ 2 + . . . fl 

which is a polynomial of degree (k -1). 

Newton’s Formula. 

24.6. Evidently these results hold out some possibility of generalising 
our method of interpolation. If, instead of only considering two successive 
values of u x , say u Q and tq, and using the linear relation between u x and x 



INTERPOLATION AND GRADUATION. 465 

that will reproduce these values to give any required intermediate value 
of u m we can use the polynomial of the second degree which will reproduce 
three adjacent values, u 0> u v w 2> or that of the third degree which will 
reproduce four, %, u v u& u 3 , and evidently we shall be likely to get much 
more precise results. But to do this we must be able to obtain the required 
polynomials in terms of the differences. We shall use the notation already 
introduced, i.e. 


or. 

Function 

First Diffs 

Second Diffs. 

Third Diffs 

Fourth Diffs 

0 

tt 0 

A„' 

A„ 3 

A n 3 

A„ 4 

I 


Aj 1 

A 3 2 

A, 3 


2 

11 j 

A*‘ 

a 2 * 


— 

3 


a 3 ! 

— 

— 

— 

4 

u 4 




1 


Fuithcr, the common mtenal for the values of x will be taken as unity, 
as shown ; in practical work 1 his is always treated as the unit until the 
end of the woik, just as the class interval is so treated when calculating 
the moments of a frequency-distribution. 

24.7. Now wnto down the leading tmn and leading differences at 
the head of a table with spacious columns, as below, up to the leading 
fourth difference, and fill m the rest of the table working back from right to 
left. In column 5 for third differences we can fill m onlv the second 
space, A 0 3 4 A 0 4 . In column \ for second diffeienees the second term 
will be A 0 2 fA 0 3 (always adding from the line above to the right); the 
thud term will be A 0 2 + 2A 0 3 + A 0 4 . We leave the student to supply the 
remainder. 


2. 

3 

4 


6. 

u K . 

First lhffs 

tSucond Djffs 

Third 

Diffs 

Fourth 

Diffs 

u 0 ^u 0 

A, 1 

A» 8 

Ao 8 

A„* 


A„' i An 3 

1 A 0 2 i A 0 3 

J Ao'4 A»‘, 

I 

«3 «0 < "Aq 1 l A 0 S 

A„’ f2A„ s -4 Ad 1 

A 0 ^ 2A„ 3 fA„ 4 

1 

j 

^3“W 0 *- 3A 0 l i 3A t) 2 f A 0 * 

A,, 1 43A ( ,‘ + 3A 0 3 i A 0 4 

1 

. ( 


~ « 0 4 4 A,, 1 4 6 A 0 2 + 4A 0 * + A 0 4 


— j 

“ 1 

j 

— 


Now look at the numerical coefficients in the expressions for u 0 , u v u 2 , 
etc ,; they run 

1 

1+1 
1 +2 + 1 
1+3+3+1 
1+4+6+4+1* 


80 





THEORY OF STATISTICS. 


466 

These are familiar figures ; they are the terms in the binomial expansions 
of (I +1)°, (1 +1) 1 , (1 +l) 2 , (1 +1) 3 , etc. We then have, generally, 




- 1 ) A 2 M* “ 3 )(* ~ 2 )a8, 

"TT2 A ° +_ lla'.S + 


(24.5) 


where the series of differences may be continued so far as is necessary to 
give a result of the precision desired. This important equation is known 
as Newton’s Rule or Newton’s Formula. It may be repeated that 
in this form of the equation the unit of x is the interval. There are many 
other formuhr of interpolation, but we propose to limit ourselves to this 
and illustrate its uses. 

24.8. It will be seen that, if the series on the right of (24.5) is termin¬ 
ated at A/, the expression is a polynomial of the rth degree in x, though it 
is not arranged according to powers of x but according to the successive 
orders of difference, which is more convenient for our present purpose. 
This polynomial passes through the ? +1 successive points (0, i/ 0 ), (1, u x ), 
(2, w 2 ), . . . (r, u,). In particular, if the series terminates at A 0 \ we 
have simple interpolation and the polynomial reduces to the straight line 
passing through (0, u 0 ) and (I, «j). If it terminates at A 0 2 , the series 
represents a parabola of the second degree passing through the three points 
(0, w 0 ), (1, u } ), (2, iu). If it terminates at A 0 8 , it represents a polynomial 
of the third degree passing through the four points (0, w 0 ), (1, u x ), (2, w 2 ), 
(3, i/ 3 ): and so on. But the student must remember that even though 
the polynomial reproduces the values of the function at 0, 1, 2 and 8, it 
does not necessin ihj closely reproduce the function at intermediate values 
of x. The whole utility of the formula is dependent on the closeness with 
which the variable can be represented locally by a polynomial of fairly low 
degree. Most ordinary functions satisfy this condition when tabulated 
for small intervals, but occasionally the student may find himself in 
difficulties. We will give some examples in later sections. 

We now proceed to some illustrations, and will give a warning at 
once ; the .student must he very careful as to signs . 

\/ 

Example 21.1 .—Given the cubes below, required to find the cube 
of 82-4. 

We give this first as an example in which the interpolation is exact , 
for the third differences are constant, so that \\c need not proceed further. 


Number. 

Cube. 

AM t). 

AM f). 

AM+). 

31 

29791 

2977 

192 

6 

1 32 

32768 

3169 

198 

6 

j 33 

35937 

3367 

204 

_ 

i 34 

39304 

3571 

_ 

_ 

35 

42875 

— 

— 

- 


As interpolation is exact, it docs not matter which term wc take as 
« 0 . Supposing we take 82. Thus for 82-4, x =0-4, and we have: 



INTERPOLATION AND GRADUATE**. 


467 


M +O 4AU (04)( “° ,6 )a ! + (H)(~ o -6)(-1*0) A 3 

-82768 + 0*4(8169) -0*12(198) + 0*064(6) 

-82768 4 1267*6 - 23*76 4- 0*884 
-34012*224 


This may be verified by direct multiplication* or from Barlow’s Tables: 
the student is recommended to carry out a check by taking 81 as u 0 . 

^Example 24,2 .—Given the following cube mots, find the cube root of 
102*5, The differences have been written, as is frequently done, without 
the insertion of the decimal point. 


Number. 

Cube Root. 

A 1 ( O* 

A a <~). 

A 3 ( (•)• 

101 

4 0570095 

153192 

I 997 

14 

102 

4 0723287 

1 152195 

983 

! 

103 

4 0875482 

151212 

, _ 1 

i 

104 

1 4 7020094 

— 

1 1 
i 

1 


Here, if we wish to attain the greatest possible precision and include the 
third difference, we can only take 101 as u 0 ; x is then 1*5, and 

t/i 6 -« 0 + l-5A0 1 4 O*875A 0 a -0'0625A 0 * 

— 4*0570095 f 0 02297880 - 0 00003739 -0*00000009 

-4*07995082 

Here we have retained an extra place of decimals throughout the arith¬ 
metic in order to get the seventh place correct in the final result, and must 
round this off to 4*6799508, Even so, we cannot avoid the effect of errors 
in our data, viz. the errors of rounding off, in the seventh place of decimals, 
the tabulated cube roots : the seventh place in our answer is still liable 
to an error of ± 1 to +2 for this reason. 

It may be noted that, as differences converge so rapidly in this 
example, simple interpolation would give an error of little more than a 
unit in the fifth place of decimals. 

Example 24.3 .—From the tabic of Ordinates of the Normal Curve 
(Appendix Table I) find the value of the ordinate at a?/cr — 0*045. 

We give this example partly as a warning to the student to see that 
his differences are converging so as to be likely to give a good result. 
The second difference is numerically much larger than the first, viz. 
392 against 199 ; he must then look at the third as well; if this be large 
also, he may have to go to a high order of differences to get precision. 
But the third difference is only 4-18 and the fourth difference smaller 
still, so third differences will suffice for the highest precision attainable 
with the five-figure table. Note that the first difference is negative, the 



#68 THEORY OF STATISTICS. 

second negative, the third positive, and since the interval is 0*1, w «■ 9*45, 
not 0*045. 

In the difference terms we have retained two decimals beyond the 
five during the work (separated by a comma): 

% 45 -fO^SAo 1 -0 12875A 0 2 4*0 0689875A 0 3 

= 0*89894 -0 00089,55 +0*00048,51 +0*00001,15 
— 0 89854 rounded off to the fifth place 

Interpolating in the seven-figuie table, Table II m 44 Tables fo) Statisticians 
and Biom&nctam this is lound coircct to the last place. It may be 
noted that, if a calculating machine is used, the products given by succes¬ 
sive terms can be cumulated on the machine. 

Interpolation of Statistical Series. 

24.9. So far we have dealt with straightforward interpolation of 
tabulated mathematical functions But interpolation may also be 
employed on statistical series, or series of figures founded on statistics, 
provided at least that the> run toleiably smoothly. No statistical senes 
or series founded on statistics does, however, run absolutely smoothly, 
like a mathematical function, unless of course it has been deliberately 
u graduated ” to do so It must be recognised, therefore, m such eases 
that we are mciel) using interpolation as a method of estimating the truth ; 
and the truth in all probability would not and could not be given by any 
process of interpolation. 

The following is an lllustiation ot a senes based on statisties 

^Example 24.4 .—In Pait II of the Supplement to the 75th Report 
of the Registrar-Geneial foi England and Wales, abridged life-tables 
were given for a number ot counties, etc The table below shows the 
expectation of life at ages 25, .85, etc to 85, based on the mortality of 
males in Cambridgeshire m 1910-12, i e the average number of years 
that individuals would have lived liom the given age onwaids, if subjected 
at each age to the mortality mentioned. Required, to interpolate values 
for the expectation of life at ages 80, 40, etc. 



Expectation 

A 1 

A 8 


Ago 

of Life 
(Males) 

A 8 



25 

42 21 

- 824 

+ 20 

f 34 

35 

33 97 

- 804 

+ 54 

■i 27 

45 

25 93 

- 7V) 

+ 81 

f 76 

55 

18 43 

- 669 | 

f-157 

- 3 

65 

11 74 

- 512 

f 154 

—. 

75 

6 62 

- 358 

— 

- 

85 

3 04 

- 

— 

— 

Total 


-3917 

+ 466 

4 134 

Bottom figures less top 

- 39 17 

+ 466 

| t-134 

— 



INTERPOLATION AND GRADUATION. 400 

Tables of mathematical functions will often give the differences, but 
in dealing with data of this kind the student will certainly have to form 
them himself, and should carry out the check shown. Having formed the 
column of first differences, he should take the total, of course paying 
attention to signs. In this case the total of first differences is -3917, 
or inserting the decimal point, -89*17. This obviously must be equal 
to the difference between the bottom figure and the top figure in the 
preceding column, as we see is the case. The following columns must 
be cheeked similarly. 

The second differences are considerably smaller than the first differ¬ 
ences. Third differences are also small, but rather irregular; it will be 
found, however, that the contributions of the third differences affect only 
the second place of decimals in the function, so we ought to attain a very 
fair result. 

To get the figures for ages 80 and 40 we have not much choice and must 
use the known values at ages 25 to 55. On general grounds it seems 
best to keep the value of sc for which we require u x near the centre of the 
values used for interpolation. So the expectation at 50 was determined 
from the values at 85 to 65, that at 60 from the values at 45 to 75, and 
that at 70 from the values at 55 to 85. The expectation at 80 was 
determined with the use of the second difference only from the values at 
65, 75, 85. 

The work is quite straightforward and the results were : 80, 88*09; 
40, 29*90; 50, 22*10; 60, 14*94; 70, 8*99; 80, 4*64. The student 

may find it instructive to draw a chart. 

But some qualms were felt as to how far the results could be trusted. 
A polynomial is not a very good function to represent an empirical function 
of the present kind which is slowly dropping to zero (see below, 24.12). 
It might possibly be more appropriate to take logarithms of the expecta¬ 
tions, interpolate between the logarithms and then convert back into 
numbers. The test was carried out as n control. The following are then 
the data and the differences :— 


Age. 

log (Expectation). 

A 1 . 

A*. 

A 8 . 

| 25 

1 62542 

009432 

-0 02298 

-0 00799 

35 

1-53110 

- 0*11730 

- 0*03097 

- 0*01662 

45 

1 41380 

-0*14827 

0*04759 

-0 00536 

55 

1-26553 

0*19586 

0*05295 

-003623 

{ 65 

1*06967 

-0 24881 

-0 08918 

— 

75 

0 82080 

-0 33799 

— 

— 

85 

0 48287 

— 

-- 

— 

Total 

— 

-1*14255 

-0*24367 

-0*06620 

Bottom figures loss top 

-1 14255 

- 0*24367 

-0*06620 

— 


The work was done exactly as before, except that the expectation at 
80 was obtained with three differences from the given values at 55 to 85. 
The results differed only very slightly from those obtained before, the 
following table giving a complete comparison:— 





470 THEORY OF STATISTICS. 



I Interpolation. 


Age. 

Direct. 

Logarithmic. 

Difference. 

25 

42-21 

42-21 

— 

30 

38-09 

38 07 

-002 

35 

33*97 

33 97 

— 

40 

29-90 

29 91 

4 0-01 

45 

25 93 

25-93 


50 

22-10 

22 11 

f 0 01 

55 

18-43 

18 43 

— 

60 

14-94 

14-92 

0-02 

65 

11-74 

11-74 

— 

70 

8 99 

9-00 

+0-01 

75 

6*62 

6-62 

— 

80 

4-64 

4-63 | 

-001 

85 j 

3 04 

3-04 | 

— 


The differences are almost immaterial. 

Notes on the Practical Work. 

24.10. Number of Differences to Use .—Provided differences converge 
fairly rapidly and continuously, there is little difficulty in coming to a 
decision. The student knows to how many digits he desires to be accurate, 
and it is no use his going on to higher orders of difference which 
affect only places beyond this ; if he wants four-figure accuracy, it is no 
good his going on to differences which affect only the sixth and seventh 
places. To enable him to see more quickly the approximate contribution 
that a difference of any order will give, the following table of the binomial 
coefficients may be useful:— 

Table 24.4. - Table of the Binomial Coefficients in Nav ton's Formula from 
x —0 to x~2 by Intervals of 0 1. 


X 

x(x~\) 

1.2 

x(x - l)(a; -2) 

~ r rr 

x{x - l){x - 2){x -3) 

1.2.3. 4 “ 

0 

0 

0 

0 

01 

-0 045 

+ 0 0285 

-0 0206625 

02 

-008 

+ 0 048 

-0 0336 

0 3 

-0 105 

+0 0595 

-0 0401625 

0 t 

-0 12 

+ 0 064 

-0 0416 

0 5 

0 125 

I 0 0625 

- 0 0390625 

0 6 

- 0 12 

4 0 050 

-0 0336 

0*7 

- 0 105 

{ 0 0455 

-0-0261625 

0 8 

- 0 08 

Ml 032 

-0-0176 

0*9 

- 0 045 

4 0 0165 1 

-0 0086625 

1 0 

0 1 

0 

! 0 

1 1 

j 1 0 055 

-0 0165 

f 0-0078375 

1-2 

MM2 

-0 032 

4 0 0144 

1 3 

, 1 0 195 1 

-0 0455 

+ 0 0193375 

1-4 

| l 0 28 

-0 056 

-1 0 0224 

1-5 

+ 0 375 

- 0-0625 

4 0-0234375 

1-6 

1 M) 48 

-0-064 

f 0-0224 

1*7 

+ 0 595 

-0 0595 

+ 0-0193375 

1-8 

j 0 72 

- 0 048 

+ 0 0144 

1-9 

1 0-855 

-0*0285 

4 0*0078375 

2-0 

4 1 

0 

0 



INTERPOLATION AND GRADUATION. 471 

4 A word of warning may, however, be desirable. Because the use of the 
(r + l)th difference would not affect the result in the /t*th figure, it does 
not necessarily follow that this polynomial value will agree with the true 
value of the function to the &th figure. 

If differences do not converge rapidly and continuously, this is in 
itself evidence that a polynomial of moderately high order does not fit 
the function well and high precision cannot be expected. The student 
may occasionally find himself faced by cases more difficult than those of 
the foregoing illustrations. For example, here are the initial values of 
P for values of v 2 proceeding by unity, and degrees of freedom y = 6 
(n' =*7), from Table XII in “ Tables for Statisticians , etc.. Part / ” : 


X *• 

1>. 

jr* 

1\ 

’ 

0 

1-000000 

5 

0 543813 

1 

0085612 

6 

0-423190 

2 

0 910699 

7 

0 320847 

3 

0-808847 

8 I 

0 238103 

4 

0 676676 

9 j 

0*173578 


If we wish to find by interpolation the value at, say, 0*5, apparently we 
have no choice but to take our w 0 at zero, for the table starts there. If 
the student begins work accordingly, he will find his differences not 
behaving at all nicely ; the second leading difference is much greater than 
the first ; the third is a good deal loss, but the fourth, fifth and sixth 
much larger than the third, and it is not until the seventh and higher 
differences that definite convergence seems to be setting in. If he 
laboriously works step by step, getting successive approximations to the 
value of P at 0-5 by using one difference, two differences and so on, he 
will get a series of very slowly converging values : 

1. 0*992806 

2. 0-999247 

3. 0*999058 

4. 0*998993 

5. 0-998445 

0. 0-998131 

7. 0-997973 

8. 0-997899 

9. 0-997805 

The true value is 0-997839, and he could have obtained this much quicker 
by direct calculation; even with the nine differences he has got only four- 
ligure accuracy. But he ought not to have expected a good result if he 
had taken the trouble to look at the run of the differences. The figures 
give another useful warning, Using throe differences, we have a worse 
result than when using two only. Increasing the number of differences by 
one step does not necessarily increase precision. 

Limitation of the number of differences suitable for use, owing to the 
effect on differences of errors of rounding off, is considered below (24.14 
and 24.15). 






^72 THEORY OF STATISTICS. 

24.11. Choice of the Set ofu's.— To interpolate, say, at a? **2*5, using* 
third differences, one might employ either the u *s at 0, 1, 2, 3, or those 
at 1, 2, 8, 4, or those at 2, 8, 4, 5; one would not go outside these limits or 
one would have to extrapolate for the value at 2*5, and that would obviously 
be unsafe. Which set is it best to choose ? Advice cannot be absolutely 
definite, but it would seem that usually (but not necessarily) values about 
equidistant from that sought should be equally valuable as guides, and on 
this principle we should try and keep the value sought so far as possible 
central to the set of u s employed. 

This suggests that one reason for our getting so poor a result above was 
that we used such a lop-sided set of u\, with the value sought apparently 
unavoidably near one end. Let us avoid this by a device. Repeat the 
value of P tor -f 1 at - 1 on the other side of zero. (It is true that this has 
no physical meaning, but the function might conceivably run symmetric¬ 
ally on either side of zero, and its graph has clearly high-order contact with 
a horizontal tangent at zero.) Now take the four values at — 1, 0, +1, 4-2 
and interpolate, using the resulting three differences only : 


i ~ 

i 

I * 

! A 1 . ! A 2 . 

! A 3 . 

— 

i 

l 

1 

t s 

-- 

-1 1 

0 9856J 2 

i r 0 014388 -0*028770 

-0*022749 

0 

1 

1 - 0 0J4388 ' -0 051,72.*) 

— 

+ 1 

0 98«’>C!J 

j - 0*005913 j 

— 

r 2 

i 

0*919099 

i - | - 

— 

Interpolating for the 

value of Mi a, we have : 



ill* -w 0 +1*5Ao 1 +0-375A 0 2 -CM>625A 0 * 
= 0007825 


The true value, as stated above, is 0*997889, and we have got a closer 
result by this rearrangement, using third differences only, than we did by 
using nine differences before. 

24.12. Possible Forms of Polynomials .—The student may also get 
into difficulties if he does not bear in mind the forms that polynomials can, 
and cannot, take ; and if he attempts to use this method of interpolation 
where the polynomial is unlikely to represent the function well even over 
a moderate range. A polynomial (parabola) of the second order can take 
only the form (a) in fig. 21.1. A polynomial of the third order can take the 
form ( h ). or I lie form (c) with a wave in the centre. A polynomial of the 
fourth order can take a form very much resembling (/;), but flatter in the 
centre, or a form like (c), but with three instead of two half-waves in the 
middle; and so on. A polynomial cannot take the form (1) of a curve 
tangential or asymptotic to the vertical, like the end near zero of an ideal 
frequency-curve ot the distribution-of-wealth type, or (2) of a curve 
slowly dropping asymptotically to the horizontal, like a logarithmic curve 
or the tail ot the normal curve-—and such functions, mathematical or 
empirical, are very frequent in statistics. In this latter case it would be 
more probable that the function could be represented by a function of the 
form 

y ss e aQ 1 + + • • • 



INTERPOLATION AND GRADUATION. 


473 


Then taking logs we have : 

u**\og 0 y~aQ + a x x ■+ a^ 2 -f , . . 

that is to say, we come back to the polynomial. Hence, if the function 
we are dealing with is tailing slowly away to zero, it is probably best to 
take logarithms and then interpolate on the logarithms. That is why in 
Example 24.4 we carried out a check in that way. There, as it happened, 
the direct method did not lead to bad results, but it is quite possible for it to 
give a completely nonsensical answer. For example, at the extreme end 
of the x 2 table lor v**2$ («'~29), we are given only the values of P 
corresponding to the following values of x l: — 


X 2 * 

/\ 

„ 

A*. 

A 2 . 

A*. 

40 

0 066128 

-0 059661 

j 0 053601 

-0*047929 

50 

0 006467 , 

0*006060 

4 0 005672 j 

— 

60 

0 000107 

| 0 000388 ( 

_ : 

— 

70 

0000010 ) 

1 __ ! 

i 



Taking differences as shown and interpolating to get an estimate of the 
value of P for y 2 - 55, i.e, v x 5 , we have : 

u Vh - w 0 4- l-5zV + 0-375A 0 2 - 0’0625A 0 S 
- - 0 000268 


But this is nonsense, for P cannot be negative. The polynomial has done 
its best : it reproduces the values at 40, 50, 60 and 70—but it can only do 
this by taking a form like (c) of 
fig. 24.1 (reversed) with a wave in 
the centre. It has, as a matter of 
fact, a minimum at x 2 —56*6 and a 
maximum at x 2 ~65*8, 0 r at 1-66 
and 2*58 on the scale of m’s with 40 
as zero and 10 as the unit interval. 

If, instead, we take logarithms 
of the above values of P> inter¬ 
polate to third differences and then 
convert back to numbers, as m 
Example 24.4, we find 0*001609 for 
the required value of P a value 
which is rational and is probably 
not far from the truth. For x 2 
«30, P ~ 0*363218. Even bringing 
in this much larger value and using 
logarithmic interpolation with four 
differences, we find 0*001746 for the value of P at ^ 2 ~55. This suggests 
that at least we may trust the value to two figures as 0*0017, which 
would be sufficient for practice ; but the value has not been cheeked by 
direct calculation. 



Effect of Errors in u on the Differences. 

24.13.—The student may notice and be troubled by the fact that, in 
the Normal Curve Tables in the Appendix, second differences appear to 



474 THEORY OF STATISTICS. 

get ft little irregular towards the tail of the curve; the phenomenon will 
become much more evident if he continues the second differences rather 
further than they have been entered, and still more so in the higher differ¬ 
ences if he proceeds to write them out. The irregularities in question are 
due solely to the errors of rounding off in the last decimal place of the 
function. Before proceeding to consider the total effect of such a system 
of errors it may be best to consider the effect of a single error. 

24.14. Effect of cm Error in a Single Value of u .—If u=*v+w, 
t^u A 3 n + Abe, and so on for all orders of differences. Hence, if v represents 
the true value of u and w represents an error, the differences of the error 
will simply be superposed on the differences of u , and we may consider the 
former by themselves. We may then, as below, take the true values of u 
as zero, and insert an error only at one point, say -ve. 


V. 

A 1 . 

~i ' 

| A a . 

! 

A 3 . 

A 4 . 

A 5 . 

A®. 

0 

0 

i o 

0 

0 

0 

•4 € 

0 

0 

! o 

0 

0 

+ r 

— 6e 

0 

0 

0 

0 


- 5e 

+ 15e 

0 

0 

0 j 

•f e 

| -4e 

+ 10e 

~ 20e 

0 

0 

1 p 

-:v 

j + Gp 

-lOf 

1.V 

t 0 

\ 0 

1 2 r 

H 3? 

- 4e j 

+ 5e 

- Ge 

1 p 

\ -p 

! r 


+ e 

e i 

4- e 

1 0 

1 1 

( 0 

i o 

1 0 

i 

0 : 

0 

0 

i 


The resulting differences are written down above, up to those of the sixth 
order, and it is evident that the numerical coefficients of e in the differences 
of order r are given by the terms of (1 -l) r . The effect of the initial 
error is therefore very rapidlv increased as we proceed to higher and higher 
orders of difference, especially after the first three differences are past. An 
error of +e in u can produce an error of + 3e or - 3e in the third differences, 
of Ge in the fourth differences, of 10c in the fifth and of 20e in the sixth. 
The maximum numerical coefficient for order r is derived from that for 
order r-1 by multiplying the latter by 2 if r is even, or by 2 r/(r f-1) if 
r is odd. 

This magnification of the error renders differencing a very useful 
method of checking the calculated table of a function, and it is often 
employed for that purpose. The matter is not quite simple, for the effects 
of errors of rounding off in the last decimal place will be superposed on the 
effects of any actual mistake, but nevertheless the effects of the mistake 
are likely to show themselves clearly in, say, third or fourth differences. 
In the following table of square roots, for example, nothing is obviously 
wrong, but an error of 2 units in the last place has been introduced into the 
square root ol‘15, which should read 3-87298 (or more precisely, 3*8729883). 
When we proceed to take differences, however, a suspicious irregularity 
shows itself in the third differences, and in the fourth differences it is clear 
t hat something is wrong. Since the position of the “ peak ” rises half a 
line at each differencing, the peak +2 shows that the mistake is in the 
root of 15. We can even estimate t he magnitude of the error. If the fifth 
differences may be taken as approximately constant, we ought to get a fair 





INTERPOLATION AND GRADUATION, 


475 


T 

Number. 

Square Root. 

AM+). 

AM-). 

A'<+>. 

A 4 . 

10 

3*16228 

0*35434 

686 

83 

-14 

11 

3*31662 

0*14748 

603 

69 

-12 

12 

3*46410 

0*14145 

534 

57 

-14 

13 

3*60555 

013611 

477 

43 

4- 2 

14 

3-74166 

0*13134 

434 

45 

- 14 

15 

3*87300 

0*12700 

389 

31 

0 

16 

4 

0*12311 

358 

31 

- 6 

17 

4*12311 

0*11953 

327 

25 

— 

18 

4*24264 

0*11626 

302 

— 

— 

10 

4*35890 

0*11324 

— 

— 

— 

20 

4*47214 

— 

— 

_ ! 

; 

— 


estimate of the true fourth difference at the peak +2 by adding together 
that difference and the two on either side of it, the total effect of the error 
c thus averaging out—compare the scheme showing the effect of the single 
error given above. This average is - 7-6. We then have: 

Ge- + 2 -( - 7-6) 
c — +1*6 

This is very near the correct value, which, as will be seen from the true 
value of the root stated, is 300 -208*33 or 1*67, the unit in the A 4 column 
being the last place of decimals of the function. 

24.15. Effect of a Series of Random Errors in u. -Suppose these errors 
to be a , b t c, d, e, as below. Writing down their differences, we have the 
following results : — 


Error. 

AK 

A 2 . 

A 8 . 

A 4 . 

a 

b - a 

c - 25 ■+•« 

d ~ 3 c 4 3 b ~ a 

e - 4- 6c - 45 f a 

b 

c ~b \ 

d ~ 2c 4- b 

e - 3d 4-3 c - b 

— 

c 

d - c 

p - 2d +c 

— 

— 

d 

e 

e ~d j 

i 

t 

_ i 

— 


The general result is obvious. In differences of the rth order, the resultant 
error in any one difference is the sum of r 4* 1 of the original errors multiplied 
in succession by the terms m the bitiomial expansion of (1-1)*, or is 
of the form 




-re 2 + 


r(r -1) 

1 . 2 ^ 


r(r - l)(r-2) 
1.2.3 4 


(24.6) 


If the errors e are distributed in a purely random wa), so that e k is un- 
eorrelated with e tc+H , and if it may be assumed that the mean error is zero, 
then the mean error in the difference of the rth order will also in a long 
series tend to zero, and the standard deviation, s rf of the above quantity 
(24.6) is given by 


. (24.7) 



0$ THEOHY OF STATISTICS. 

where s 0 is the s.d. of the original errors e 9 and F(r) is the sum of the squares 
of the terms in the binomial expansion of (1 - l) r . 

F(r) increases very rapidly with r. The following table gives the value 
of J?(r) and of its square root from r — 1 to r ~ 6: — 


r. 

}\r). 

V F(r). 

1 

2 

1*41 

2 

6 

2*45 

3 

20 

4-47 

4 

70 

837 

r> 

2 52 

15*87 

0 

924 

30-40 

__ _ 

_ __ 

— 


The standard deviation of errors in the fourth differences is therefore over 
eight times, and in the sixth differences over thirty times, the s.d. of the 
errors affecting v. 

If the decimal place in u be regarded as following the last figure 
retained, the errors of rounding off that figure may be regarded as uniformly 
distributed over a range j 0*5, and their standard deviation, ,s 0 , is therefore 
Vl/12 or 0*288075. This gives the following figures for the s.d. of errors 
in the successive orders of difference owing to the errors of rounding off 
in u :— 


Order of Difference. 

S.d. of Errors. 

1 

0-41 

2 

0 71 

3 

1*29 

4 

2*42 

5 

4*58 

6 

8*77 


The effect of the errors of rounding off evidently increases very rapidly 
with the order of difference. With a mathematical function for which 
the true differences rapidly and continuously converge, the effect of the 
errors will in fact soon, so to speak, “ take charge ” ; the observed differ¬ 
ences will rapidly and steadily diverge, growing larger with each successive 
differencing. At the same time two other phenomena will show them¬ 
selves. Looking back at the scheme showing the effect of the errors 
a, b 9 c, d , e 9 it will be seen that in any one column the same error enters 
into successive differences with sign reversed. Also in any one line 
the same error enters into successive differences with sign reversed. 
Hence, as the effect of errors of rounding off becomes overwhelmingly 
great, (1) the differences of the same order tend to alternate in sign, (2) 
differences of successive orders on the same line tend to alternate in sign. 
If these phenomena start to show themselves, the student may well 
suspect he has gone too far in his differencing. It is evidently no use 
proceeding to an order of differences mainly significant of errors. 



INTERPOLATION AND GRADUATIONi 


477 


These results for the effect on differences of a random series of errors 
have an application, not only to the effect of errors of rounding off in 
mathematical tables, but also to the theory of the method of differences in 
correlation (ref. (891)). 

Effect on Differences of Subdividing an Interval. 

24.16. We mentioned early in this chapter (24.2) that, in general, 
it would become possible to use simple interpolation alone on a table of 
a mathematical function provided intervals were made sufficiently fine, 
but this was not proved. Let us consider the effect on the differences 
of subdividing an interval; it will suffice to take the ease of halving it, 
and for brevity let us confine ourselves to the first three differences. 

In terms of Newton’s formula the values of u at 0, 0-5, 1, 1*5, are 

z/ 0 ™ u 0 

u 0 . 5 =« 0 + 0-SAo 1 - O*125A 0 2 + 0 0825A 0 3 

Wi^Wo+Ao 1 

ttj.g 4 1-5A,, 1 + O-375A 0 2 - 0-0625A 0 3 

If the student will write down those expressions at the left of a sheet 
of foolscap placed lengthwise, and take' the differences in the ordinary 
way, he will find that the new leading differences for the subdivided 
series with intervals of half the original interval are gi\en by 

So 1 0-SAo 1 - 0-125A 0 * -M)‘0625A/| 

V -O*25A 0 a -O125A 0 3 . . (24.9) 

V-O*I25A 0 3 I 

If the A’s of the original series converge rapidly, an assumption really 
implied by the fact that we stopped at the third difference, so that we 
can regard the successive A’s as of different orders of magnitude, it will 
be seen that Sq 1 is of the order of magnitude Ch5A 0 \ S 0 2 is of the order of 
magnitude ()*25A 0 2 , and 3 0 3 of the order of magnitude 0*125A 0 3 . That 
is to say, the new differences are not only smaller than the original 
differences, but converge much more rapidly. 

If we had divided the original interval into ten instead of only two 
parts, we could have found the new leading differences in precisely the 
same way, and would then have obtained the result that Sq 1 was of the 
order of magnitude 0*1 A 0 \ 8 0 2 of the order of magnitude 0*01 A () 2 , and 
so on, the general rule being obvious. lienee it is only necessary to 
subdivide the interval sufficiently in order to render the differences so 
rapidly convergent that first differences alone can be used. 

In works on the method of differences, tables will usually be found 
giving for various values of the number of subdivisions the formulae 
relating the S’s to the A’s. 

We now turn to some statistical problems. 

Breaking up a Group. 

24.17. Suppose we are given the numbers living, or the numbers of 
deaths, in successive ten-year age-groups, we may often desire to estimate 
the numbers in smaller, e.g . five-year, age-groups, or even at single years 




47# THEOBY OF STATISTICS* 

of age* The initial difficulty and the method of procedure will best be 
shown by an illustration. 

j Example 24,5 ,—The following are the numbers of deaths in tour 
successive ten-year age-groups. Required to estimate the numbers of 
deaths at 45-50 and 50-55. 


Age group 


Deaths. 


25- j 13,229 

35- I 18,139 

45— I 24,225 

55- I 31,496 


Now evidently interpolating directly between these figures will not help 
us. If we interpolated directly between the figure for 85- and the figure 
for 45™ (half-way between), we would only have an estimate of the numbers 
in the ten-year age-group 40-50. We must proeeed as follows. Add 
up the given numbers step by step ; this will give us a new set of figures 
showing the numbers o\er 25 hut less than 85, over 25 but less than 45, 
over 25 but less than 55, and over 25 but less than 05. Interpolate in 
this new series to find the number over 25 but less than 50, and the differ¬ 
ences from the numbers next above and below will give the answer 
desired. The work is as follows :— 


1. 

__ _ 

Sum of Deaths 

3. 

4. 

5. 

Exact Age. 

from 25 to Age 
Stated. 

Ah 

Ah 

A®. 

25 

0 

l 13,229 

1 4,910 

l 

+ 1,176 

35 

J 3,229 

f 18,139 

4 6,086 

+ 1,185 | 

45 

31,368 

4 24,225 

+ 7,271 j 

— 

55 

55,593 

+ 31,496 

- 

— 

65 

87,089 

— 

_ 

1 


Column 2 gives the numbers from age 25 up to each age stated; 
column 8 the first differences, reproducing the numbers in the age-groups ; 
columns 4 and 5 the second and third differences. Since the two third 
differences are very nearly equal, working to third differences ought to 
give us a very fair result. We can accordingly take age 85 as our zero, 
and age 50 will be 1*5 on the scale with the interval as unit. We have 
accordingly, 

Ui.5 * ■«o + 1 *&V + 0-875/y - 0*0625A 0 3 

— 18,229 + 1*5(18,139) -f 0*375(6,086) -0*0025(1,185) 
-42,645*7 

or 42,046 to the nearest unit. Subtracting 31,868 from 42,646, and 



INTERPOLATION AND GRADUATION*. 479 

42,046 from 55,598, we then have for our estimates of the numbers of 
deaths: 

45-50 11,278 

50-55 12,947 

As a matter of fact, the numbers in quinquennial groups were given, and 
for 45-50, 50-55, were actually 11,404 and 12,821; the error of our 
estimates accordingly is only of the order of 1 per cent. 

Eorample 24.6 .—From the same data, estimate the number of deaths 
in the year of age 50- 51. 

The limits of this group on our scale of intervals are, with 85 as origin, 
1*5 and 1*0. We have already found the number up to 1*5 in Example 24.5, 
and it remains only to determine the number up to 1*6, the differenee 
between the two figures then giving the answer sought: 

U x 6 - i/ 0 4 1-OAo 1 4 O*48A 0 2 ~ O O64A 0 3 

- 13,229 4 1*0(18,139) 40*48(6,086)-0*064(1,185) 

~ 45,096*8 

or 15,097 to the nearest unit. Hence the answer is 45,097 -42,646, or 
2451. 

Simple Formula for Halving a Group. 

24.18. The problem of estimating the numbers in the two live-year 
groups of which a ten-year group is composed occurs so often, that it is 
worth while deriving a simple second-difference formula for the purpose. 
Let u\ denote numbers in five-year groups, zc’s numbers in ten-year 
groups : and let S\ and A’s denote the corresponding differences. For 
second differences wc need only consider three consecutive ten-year groups. 
From Newton’s formula we have : 

Uq 

%= t'o + V 

7Cq “= 2 Uq 4 8^ 

Uo==- W 0 4 28 0 1 4 S 0 2 
W 3 - U 0 4 3S 0 X 4 3S 0 2 


u\~2u 0 f oSq 1 4 4§ 0 2 
- Uq 4 48 0 3 4 68 0 2 

7^ 0 4 5S 0 1 4lOS 0 2 

w 2 — 2« 0 4 OSo 1 4-168 0 2 

Now write down these values of the re’s and differenee : 


X . 

w x . 

A 1 . 

A a . 

0 

2?/ 0 4 (\ l 

4<V t-4,V 

8<V 

1 

-1 5<5 0 x 4 4<V 

4V+12V 


2 

2« 0 4 9<y 4 i«v 

I 




T&EOKY OF STATISTICS* 


480 

Whence 

A 0 X s=4(S 0 1 + S 0 2 ) 
A 0 2 =8S 0 2 


V-AV 

V & iA u * - AA 0 3 

Hence, 

u. y - u {) + 2S 0 X i S 0 2 
“~w 0 + :a 0 x ~ja 0 2 

n z -\w x ^ -iA^-^Ao* 

--*(2A 0 »+V) 

It will be convenient for practical work to express this directly in terms 
of the w 9 s : 

2A 0 X = 2 w 1 - 2zv 0 
A 0 2 - zt\, - 2n\ i zv 0 


t -^o 2 ~ 

Whence finally, 

w 2~ !( w i + Sfa'o “U\j)} . . . (24.10) 

Thus, taking the figures and problem of Example 21.5 again, we have: 

?e 0 —18,1*10 
w 1 -24,225 
w 2 -81,490 

- 1,669-6 
tCj - 24,225 


and half this gives 


22,555-4 


?^ 2 11,278 


to the nearest unit, as before. For ?< 3 , of course, we have also, as before, 
24,225-11,278—12,947. Equation (24.10) is really equivalent to Ihe 
method of Example 24.5, though in that illustration we used three differ¬ 
ences. But the third differences of the numbers “ aged over 25 but 
under cr 15 are equivalent to the seeond differences of the numbers in the 
successive age-groups. 


Graduation. 

24.19. If a graph is drawn showing the numbers of either sex living 
at each single year of age, as given m any census which provides data in 
such detail, it will be found anything but smooth, showing the oddest 
peaks and hollows which repeat themselves, once adult life is reached, at 
ages showing the same final digits. Thus, in the Census of England and 
Wales there are conspicuous peaks at the round-numbered ages SO, 40, 50, 
etc. (last birthday), and hollows or deficiencies at the ages ending with 1 
and, less emphatically, at the ages ending with 7. With returns from less 



INTERPOLATION AND GRADUATION. 481 

educated populations, the phenomenon may become almost ludicrous, e.g . 
in a certain Indian census sample-count : 

I Ago Last Birthday, j Number of Males. 


20 

927 

30 

12,294 

31 

652 

32 

2,058 

33 

672 

34 

892 

35 

7,723 

36 

1,137 

37 

870 

38 

1,362 

39 

467 

40 

10,391 

41 

460 


Now whatever irregularities might occur in the true figures, we may be 
quite certain that they should not show errors that are simply a function 
of the final digit of the age. We would prefer, therefore, to eliminate these 
errors. We could do so, somewhat toughly, bv drawing a graph as 
suggested and sweeping a clean curve through the rather scattered and 
irregular points given by the data, subsequently reading oft smoothed or 
graduated figures from the curve. The graphic process has many points to 
recommend it, but is \er\ dependent on personal skill and judgment. It 
would be comenlent to use* a moie “ mechanical process that anyone 
could apply and be sure* of obtaining the same* results it he used the same 
process. It would lx* quite possible to fit polynomials to the data by the 
methods of Chapter 17, but this would in general entail a gieat deal of 
labour and would not necessarily lead to satisfactory results, e.g. with such 
highly' erratic data as those above. More suitable processes can be 
founded on the method of difference's, and tlie general idea, of them all is 
quite simple, though the dc tails may \ ary greatly and the* practical working 
of some of them become rather complex. All methods begin by assuming 
that the totals of certain age-groups — fiv e-year or ten-year age-groups as 
a rule —are reasonably” accurate. These totals can then be redistributed 
over single years of age by the elementary process of Examples 24.5 and 
24.0, or the procedure can be m some way elaborated. We shall illustrate 
only the simple process. 

Example 24.7.~ The English Census of 1911 gives the following numbers 
of males in the three age-groups stated. Obtain graduated numbers at 
single years of age for the decade 40 to 49. 

i Age*group. ! Numbei. 

2,637,304 
2,001,176 
1,376,236 


30- 

40- 

50- 


31 



482 


TITKORY OF STATISTICS. 


As before, we form the sum of these numbers step by step from the 
top and then take differences. 


Exact 

Ago. 

Sum of 
Numbers 
from 30 

A’( O 

A J ( -)• 

A d ( 4 ). 

30 

0 1 

2,637,304 1 

1 636,126 

11,184 

40 

2,637,304 1 

2,001,178 

624,942 


50 

4,638,4S2 1 

1,376,236 

i 

- 

60 

6,011,718 

- 

1 

- 

__ ' 






We now, taking 30 as our zero, require to interpolate at M, 1*2, 1-3, etc. 
to 1*9. The cocllicients of the several differences in the successive applica¬ 
tions of Newton's lormula are : 


A 1 . 1 A 2 . 


A d . 


1 1 

f 0 075 

j > 

4 0 12 

I 3 

, 0 195 

I 4 

1 0 28 

1 5 

0 375 

1 6 

j 0 48 

1 7 

. 0 595 

1 8 

4 0 72 

1 9 

4 0 855 


-0 0165 
0 052 
0 0155 

- 0 050 
0 0625 
0 064 

- 0 0505 
-0 04s 

- 0 0285 


The results, with the known numbers to age 10 and to ago 50 added, 
are as given m the steoiid column below, and in the fourth column they 
are differenced to obtain the graduated numbers at each year of age, the 
total of which must agree with the observed total m the ten-year group. 


1. 


.1. 

j 4. 

Exac t 
Ag<. 

Sum of Copulation 
from 30 to Age 
State d. 

Ago 

8 hunt 
Bnthcla>. 

i 

Graduated 
Nurnbei. 

40 

2,637,304 

40 

228,559 

41 

2,865,863 

41 

222,209 

42 

3,088,072 

42 

215,870 

43 

3,303,942 

i * ;t 

209,542 

14 

3,513,484 

i 44 

203,226 

45 

3,716,710 

I 45 

196,920 

46 

3,913,630 

46 

190,626 

47 

i 4,104.256 

47 

184,344 

48 i 

4,288.6(H) 

48 

178,071 

t? 

4,460,671 

49 

171,811 

50 

4,638,482 


Total j 



2,001,178 






INTERPOLATION AND GRADUATION. 


483 


Below, these figures are compared with the actual returns at the single 
years of age and with two other graduations: (1) A graduation given in 
the Census report and prepared by Mr George King, PM.A., based on certain 
quinquennial age-groups. (2) A graduation using analogous methods, but 
based on ten-year age-groups, made at a later date m I he Go\ eminent 
Actuary’s Department, and reproduced by permission. The methods are 
described in rather more detail below. 


1. i 

o 

3. 

’ 4. 

1 

r>. 

1 

Age j 
Last 

Buthrift}. | 

(\ imus 
Aumlx r«. 

1 i 

' Graduation 
Above. 

i 

iving’h 
Grad nation, 
A’,. 

Giarimitmn 

1 A,. 

i 

i 

40 | 

202,600 

228,750 

231,070 

| 231,307 

11 ! 

ios,:m 

222,200 

223,721 

225,150 

42 , 

226,889 

| 215,870 

216,550 

210,233 

U 

100,201 

200,512 

200,3 U 

212,785 

it 

190,010 

1 203,22b 

1 202,143 

200,109 

4 r > | 

202,408 

100,020 

105,m 

144,442 

4*) 

181,881 

190,020 

188,610 

192,001 

17 

170,713 

184,314 

182.177 

I 185,883 

IS 

180,271 

178,071 

170.991 

170.105 

14 

172,770 

171,81 I 

171,580 

172,504 

Tf >t A1 

2.001,178 

, 2,(M>1,178 

] ,407,707 

1 2,024,75.5 


11 \\< compare the dost ness of lit of the sevcial giaduations to the 
Census returns h\ adding up the ddfen net s, observed number hss gradu¬ 
ated number, without ugaid to their sign, and expitssmg tins total as a 
percentage of the population (2, 001,ITS), it will b< found that our gradua¬ 
tion gives a percentage deviation ol 0-2S, King's graduation (K x ) a per- 
tentage deviation of (»•()!), and the graduation A £ a percentage deviation of 
(*■!<()- figures which do not differ veiv laigtlv. It will be noticed, how¬ 
ever, that both the K graduations give, over tin tange considcied, a small 
biased error, the total population over the* ten veais b< mg too small for 
Aj and too large for A' 2 . As icgards the deviations ol the several gradua¬ 
tions from oik* anothei, the percentage deviation of our graduation from 
K x is (MH and from A\> 118, reckoned m each c\ise* on the true total popula¬ 
tion, and the* percentage deviation of A 2 from A { is 1*85, re*ckoncd on the 
A r j total. At some individual ages the differences run up to nearlv 2 per 
cent. This is a warning to the student that while it is true that the use 
of any one of these methods b\ different worke rs must, unlike the use of the 
graphic* method, lead to the same result, yet the choice of different methods 
may lead to results almost, it not quite, as divergent as those obtained by 
different users of the graphic process. Graduated numbers of hundreds of 
thousands carried to the last unit suggest a degree of precision much 
higher than exists. 

There is evidently a certain imperfection in the e lementary method we 
have used. If we employed t he same met hod to graduate the numbers at ages 
30 to 39, using the numbers in the three ten-year age-groups 20 -, 30 -, 40 -, 




484 


THEORY OF STATISTICS. 


there would be a discontinuity at 40, Tor the two graduated series would 
be given by ares of distinct polynomials. The discontinuity might not 
be conspicuous, but it would be there and would probably be brought out 
by differencing. To get over this, at least in part, a simple adjustment 
can be used. Continue the graduated series for 80 to 89 over the next few 
years of age, say to 12. Also continue our series for 40 to 49 backwards to 
37. Over the six years 37 to 42 we then have two graduated values at 
each age, and these may then be averaged with weights which gradually 
throw the weight from the earlier series on to the later—say such simple 
weights as 0 to 1, 5 to 2, t to 8, 8 to 1, 2 to 5, 1 to 6. We have also paid no 
particular attention to the choice of the limits of our ten-year age-group. 
Of course it might happen that the numbers were only compiled m ten- 
year groups like 2030 , 10-, etc., and then there would be no choice. 
But if the figures are given at single years, the choice is at our disposal, 
and it may be that we have not chosen wisely. Part of the excess at the 
peak figure is probably drawn from lower ages, and it might have been 
better to keep the “ peak ” at the round-number ages well inside the group, 
i\g. by compiling totals for the decades 85-, 45 , etc., rather t ban those used. 

Mr King, in the Census graduation, used live-year age-groups as his 
basis, and chose the limits 4 8, 9-18, It 18, etc., as probably giving the 
totals nearest the truth. Taking these live-year totals m successive sets 
of three, he used the precise procedure of our Example 2t.(> to determine 
a graduated figure for the central > ear of the fifteen e.g. the three groups 
covering ages 4 18 would gi\e a graduated number at age 11, the three 
covering ages 9 to 28 would give a graduated number at age 1(5, and so 
on. But here* his process broke away. Taking four consecutive graduated 
numbers live years apart and determined in tills wav as 14 pivotal values,” 
he used the method of differences to determine a polynomial of tin* third 
order not passing through the four points ?/ 0 , u v ?/ 2 , u A , but subjected to 
the four conditions (1) that it should pass through the two points iq 
and a 2 , (2) that at u l and w 2 it should ha\c a common tangent with the 
eoiresponding are determined from the next (overlapping) set of pivotal 
values. In tins way continuity was assured, but equality of observed 
and graduated totals for the five-year groups was lost. (The process 
used was a simplification of the process of oscillatory interpolation , by which 
two ares meeting at a point are given not only a common tangent but also 
a common radius of curvature. It might be called “tangential inter¬ 
polation.”) The desirability of using five-year groups may be questioned. 
It is true that ten-)ear groups are rather large, but the errors that we are 
trying to eliminate are definitely functions of the ten final digits, ami 
however the limits are chosen there is likely to remain a systematic 
difference between the adjacent groups of successive pairs if five-year 
groups are used. 


The test ot A’ 2 , in which an analogous process was used but based 
* >n the ten-year age-groups 5 14, 15-24, etc., was therefore of interest. 
Over the range of 80 80 years the differences between K 2 and K 2 gave a 
smoothly running cyclical curve with a tendency towards a period of 
ten years, as might have been expected. 

,, The simple process given in Example 24.7 is applicable throughout 
the buik of life, but not at the two ends of the series, where special tricks 
oi the trade have to be employed. The difficulty of interpolating in a 



INTERPOLATION AND GRADUATION. 


485 


14 tail,” where the numbers are slowly approaching zero, has already been 
pointed out. For graduation these difficulties are increased, and it is 
often best to drop the method of differences altogether and use some 
special process, such as assuming a law of decrease or fitting the tail of a 
f req uenc y -d i st ribu tion. 

Inverse Interpolation. 

24.20. By interpolation we determine the value of the function for 
a given value of the variable. If we are given the value of the function 
and find the corresponding value of the variable, we are performing 
inverse interpolation. The student has carried out the process, in a 
form corresponding to simple interpolation, whenever he has determined 
the number corresponding to a given logarithm by the use of a table of 
logarithms - not a table of anti logarithms. If we need only take first 
differences into consideration, the process is, in fact, very simple. From 
Newton's formula we have 

whence 


u x - lip 

A 1 
-*o 


(24.11) 


where w 0 will naturally be taken as the tabulated value next below Uj 
If we must take second differences also into account, we have 

u x =■ u {) 4 .rA 0 l A J W 

which gives the quadratic for x 

h& {) 2 x 2 t (A 0 X - .]A 0 2 ).r - (w, - m 0 ) - 0 

or, solving. 


( 21 . 12 ) 


^SAq 1 A 0 “ 2 (v, m 0 ) ( 2A 0 * A 0 “ f 

.AO J. \ AO 


*>A 2 


2A 0 a 


(24.13) 


The sign to be taken for the square root will he evident on carrying out 
the arithmetic. 

This is not always a very eomenient expression to use, the solution 
(compare Example 21.8 below) being given as a comparatively small 
difference between two large quantities. If <r x is the approximate solution 
given by first differences, we can replace x in equation (24.12) byj^+ft 
and solve for the correction h on the assumption that it' 2 may he neglected. 
This gives 

2trjA 0 2 f 2A 0 L - A 0 2 

__ ,r i(l ~)p 

2 +( 2^-1 )p 


where 


A 2 

P \ 1 


If we may further assume that p is small, this reduces to 

h - ^(l -x t )p . 


(24.14) 

(24.15) 

(24.16) 



486 


THEORY OF STATISTICS. 


Obtaining a first approximation from first differences, we can use (24.10) 
to get a second approximation, then insert this second approximation in 
(24.30) and get a third approximation, and so on until the process of 
approximation makes no further difference. But note the assumption 
made that p is small. 

Example 24.H. - To find from the area-table of the normal curve 
(Appendix Table 2, p. 532) the approximate value of the quart lie 
deviation, i.e. the value of or jo lor which A - 0-75. 

The data arc : 


xfa. 

0-0 


A. 

0-72575 


Hence, 


Ao 1 - 

+ 0*03229 


A ( , 2 . 

-000219 


u r ~ w 0 —0*02425 


and the first approximation to a by first differences only is 

002 125 

ii'i + 

1 0*03229 

- +0-07510 


0-7510 interval 


or measured from the zero of the scale, the first approximation to the 
quartile deviation is 0*67510. 

Turning now to th(* quadratic (21.13), the solution is 

15-2443 - 14-4997 
0-7J16 interval 
=- 0*07446 


the sign of the root having evidently to be taken as negative. Using 
second differences, then, our appioxnnation to the quartile deviation is 

0*67446 

The true value to five places is 

0*67419 


so the me of second differences only lias left an error in the last digit. 

Let us see how tin suggested process of approximation would have 
worked. From (24. 10): 

// - - 1 I \ 0-751 y 0-219 

- 0*0063 4 
tTj 0*751 


x 2 ~ 0*74466 

Now taking <r 2 as the second approximation : 

h - 0*033911 4 * 0*74466 y 0*25534 
- - 0*006 45 
cT, 0*751 




0*74455 



INTERPOLATION AND GRADUATION. 


487 


If wo repeat the same process again, —0*74155, which is the same as cr 3i 
so it is no use going further, and 0-67 1 t(i is as close as we can get. 

If third and higher orders of difference are brought into account, we 
have an equation of higher degree than the second, which can be solved 
by Newton’s method of approximation, but the student will find more 
direct methods given in advanced works. 

Estimation of the Position of a Maximum. 

24.21. In this and the following problem an elementary knowledge 
of the calculus is assumed ; the student who does not know the calculus 
may nevertheless find the results useful. 

Suppose we are given three equidistant ordinates w 0 , n 2 , at 0, 1 

and 2. Required to find the position oi the maximum of the parabola 
passing through the tops of the ordmaies. We have : 

A 1 '*( V 1 ) A v 

Uj i/ 0 4 <rA 0 4 ^ f A 0 “ 

Differentiating with respect to d' and equating to zero, the abscissa of the 
maximum is given by 

A 0 ! 4- — 1 )A 0 2 = 0 

or 

or-0-5 -^°’ .... (24.17) 

Very often, perhaps most frequently, our data arc not ordinates but 
rather areas ; * .£. if we want to estimate roughly the position of the mode, 
our data will be the total frequencies in three successive class-inf ervals— 
not the central ordinal es of those intervals. We should then, as in Example 
24.5, form the sum of these data step by step and take the second differential 
of the polynomial passing through the* resultant points in order to deter¬ 
mine the mode. Thus, calling the sum w : 

1 I 

x. u. i x. Sum w. I 

1 I 

I I 1 

o t/ 0 1 o r> | o 

1 W 0 t A,, 1 1 I Wu 1 

2 w 0 l -’V iV < 1 * r » V 

, 1 2 * r » i 2tfo 1 V 1 A,, 2 I 





488 


THEORY OF STATISTICS. 


or 


x = 1 


Ao 1 

V 


Since x is now measured from - l, this is the same answer as before. Tf 
we are concerned only with second differences of the data, and not with 
differences of any higher order, it does not matter whether our data are 
ordinates or areas. 

The method must be used with caution ; obviously it cannot give at all 
a precise result unless the data run smoothly, and if it be used for determin¬ 
ing the mode, may easily give an answer appreciably divergent from that 
obtained bv fitting a frequency-curve. The following illustration will serve 
as a warning :— 


Example 24.9 .—The following are the frequencies near the mode in a 
distribution of barometer heights. Estimate the position of the mode, (1) 
from the first three, (2) from the last three. 


1 

Height (inches). 

Frequency. 

29*9 


30 0 

382 r> 

30* 1 

393 5 

30 2 

■ 

315 


Differencing : 


Height 

(inches). 

Fiequeney. 

A 1 . 

A 2 . 

29 9 

339*5 

4 43 

- 30 

30 0 

382 5 

i 13 

- 93 5 

30 1 

395 5 

-80 5 

— 

30 2 

315 


| 

1 


Taking the first three frequencies and their differences : 

43 

x =0*5 +-^ -1*933 intervals - 0*193 inch 
.*, Estimated mode—30*093 

Taking the second three frequencies and their differences : 

13 

x =-0*5 4-—-— —0*639 interval —0*061 inch 
93*5 

.*. Estimated mode =30*064 

Our two answers therefore differ sensibly from each other, and also 
from the value given by a fitted Pearson curve, viz. 30*039. 





INTERPOLATION AND GRADUATION. 


489 


Modifying Central Ordinates to Equivalent Areas. 

24.22. Supposing wo fit a theoretical frequency-curve to an actual 
distribution, and want to determine the “ goodness of fit ” by the x 2 
method. We would usually proceed by calculating, from the curve 
determined, the ordinates at the centre of each class-interval and taking 
these as the frequencies. But this procedure is not exact, for the central 
ordinates arc not precise measures of the areas. In a class-interval 
centred exactly on the mode, for example, the central (maximum) ordinate 
obviously gives too large a \aluc for the area. Required, to obtain some 
simple formula for modifying the central ordinates so as to give the areas. 

We have, by Newton's formula, 

«, H-.rA,, 1 f l(.r 2 -^r)A 0 2 

- "(> I (V - 1-Vk + 


Integrate this expression for the interval round tq, i.e. between the 
limits 0*5 and 1*5, and we will have an expression for the equivalent area, 
say tv l : 


«T 



«'i 


Mo t-V-JA 0 2 + !,'A ( , 2 

-"o + Ao 1 + ,\A 0 2 

-Mj 4 .j' 4 A 0 2 ) 

- AK +22«i + « 2 )j 


(24.18) 


The first form of the formula is, in general, the more convenient, but the 
second maj be the better if correction is m anted only to a single value oft/. 

Example 24.10 .—Table 21.5 (p. 100) gives in column 2 the calculated 
ordinates of a Pearson curve at the centres of the class-intervals. In 
columns 3 and 1 are given the iirst and second differences, and in column 5 
are given the corrections A 0 2 /21, shifted one line down so as to be on the 
same line as the ordinate to be corrected. Finally, m column 0 we have the 
sum of the ordinate and the correction, or the area. The totals given at 
the foot are simply for the purpose of cheeking: since columns 2 and 3 
both begin and end with zero, the sums of both first and second differences 
must be zero. Since column 5 is derived from column l by dividing 
by 24, its sum should also be zero, but errors of rounding off have made 
a very small negative excess. All the corrections are very small ; they 
are necessarily greatest where the curvature is greatest. 

24.23. A few words in conclusion. The process of interpolation, and 
still more that of graduation, is almost as much artistic as scientific. No 
absolute rules can be laid down, judgment must be used, and it is the 
experienced craftsman who is likely to get the best results with the least 
labour. If the student turns up his Latin dictionary he will find that 
interpolate means not only u to polish up” ( polirc , to polish)—so that 
graduation is really the implication of the word but hence “ to corrupt, 
to falsify.” It will do him no harm to bear this etymological meaning m 
mind, and keep a look-out accordingly. 




490 


THEORY OF STATISTICS 


Tvm.i 21.5. 


1. - 

2. 

3 

4. 

5. 

6. 

Class - 
mtei val. 

Conti al 
Onlm 

A 1 . 

A 2 . 

Correction. 

Aiea. 



0 00 

1 0 08 



0 

0 00 

f 0 08 

t 0 70 

1 0 00 

0 00 

1 

0 08 

1 0 78 

f 3 08 

H) 03 

0 11 

2 

0 80 

1 3 80 

1 6 91 

i 0 13 

0 99 

3 

4 72 

1 10 77 

1 7 18 

K) 29 

5 01 

4 

15 49 

1 17 95 

0 55 

1-0 30 

15 79 

5 

33 41 

-l 17 40 

-10 70 

0 02 

33 42 

6 

50 84 

H 0 04 

- 13 70 

-0 45 

50 39 

7 

57 48 

7 00 

7 88 

0 57 

56 91 

8 

50 42 

14 94 

1 0 00 

, -0 33 

50 09 

9 

3 r > 48 

- 11 88 

1 4 37 

1 0 00 

3.) 48 

10 

20 00 

10 51 

+ 4 67 

4 0 18 

20 78 

11 

10 09 

5 81 

1 3 15 

1 0 19 

10 28 

12 

4 27 

2 09 

1 1 04 

4 0 13 

4 38 

13 

1 50 

- i 05 

H 0 69 

f 0 07 

1 63 | 

14 

0 51 

- 0 30 

I 0 25 

0 03 1 

0 54 

15 

0 15 i 

| - Oil 

4 0 08 

4 0 01 | 

1 0 16 

10 

0 04 

0 03 

4 0 02 

fO 00 

0 04 

17 

0 01 

0 01 

^ 001 

I- 0 00 

1 0 01 

18 

0 00 

0 00 

0 00 

1 0 00 

0 00 | 


280 02 

1 4 57 4S 

4 32 89 

1 1 36 

286 01 



57 18 

32 89 

- 1 37 



SUMMARY. 

1. The first, second, thud, . . . differences of a. function are defined 
by the equations 

V Ml - U 0 
A 0 “ A , 1 Ao 1 
A,^ Aj^ A 0 “ 

etc. 


the intervals between successive values of the variable x being equal. 

2. By means of Newton’s ioimula, 


M a . = W 0 4a’A 0 1 4 


u A 2 ®(a , -i)(®-2) 

1 . 2 0 f ] . 2.3 ~ 


A 3 


4- . . . 


we can interpolate for the value of u x . 

3. Errors in the values of u become of increasing importance as the 
order of the differences increases. 

4. For inverse interpolation 


u 


A„ l 


0 


for first differences ; 


? 





INTERPOLATION AND GRADUATION. 


491 


-Ao 1 ~ A () " l'2(u x u {} ) /2A 0 l A 0 2 \ 2 

-v ±yi v v -A 0 2 ; 

for second differences. 

We can also proceed by successive approximation. If x 1 is the approxi¬ 
mate solution by first differences, a closer approximation is 4 h , where 


A 2 

2 4 ( 2 ^ - 1 ) \ 


EXERCISES. 

24.1. In the area table of the normal curve, Appendix Table 2, find the 
value of A for x/n — 1 54, noting the successive approximations up to third 
differences. r l’ake ?/ 0 at 1 4. 

21.2. Find as closely as possible the value of P for y 2 — 11*7 from the following 
entries in the y 2 table (“ Tables j or Statisticians''): i» - 17 (n'--18). Note the 
successive approximations and the number of plaees to whieh your final answer 
is probably trustworthy. 


0*903610 
0 850504 
0 800136 
0 736180 


24.3. From the following entries in the same tabic for v — 24 (rC -25), estimate 
as closely as you can the value of P for y 2 — 13. Similarly, estimate the closeness 
of your approximation. 


Z 2 * 


10 

11 

12 


r- 

r. 

30 

1 

0181752 

40 

0021387 

50 

0-001416 

60 

0 000064 


24.4. The following (p. 492) were the deaths of males registered in England and 
Wales during the three years 1930, 1931, 1932, at the ages stated. The figures 
on the right give the totals of the quinquennial groups whieh were, on this 
occasion, held to give the best totals for determining quinquennial “pivotal 
values.” Find graduated numbers for the ages 40 to 44 inclusive. 



492 


THEORY OF STATISTICS. 


-. „ -- 

~ ■“ 


Age. 

Numbers. 

Quinquennial Totals. 

35 

3394 


30 

3505 


37 

3501 


38 

3947 


39 

399S 

18,345 

40 

4220 


41 

4281 


42 

5024 


43 

4993 


44 

5260 

23,778 

45 

5998 


46 

6113 


47 

6463 


48 

6921 


49 

7663 

( 33,158 

_„ 

_ 

! 


24.5. Let n 0 , n t , w 2 , . . . w 14 be tlie numbers in hfteen consecutive years of 
age, as in Exercise 24. 1, and w? 0 , u? 6 , w 10 the totals in the three quinquennial groups. 
Show that if we want only the graduated figure for m 7 as a “pnotal 'value,” this 
may be written down at once from the equation 

w 7 —0 2zv u -0 008A-w n 

(King’s formula). Verify by comparison with your answer to Exercise 21.4. 

24.6. Generalising the above result, show that if 7 c 0 , «j r , iv 2r arc* three suc¬ 
cessive age-groups of r ^cars each, wc have for the graduated central value 

re, r - 1 = nv_^-l A Vu°\ 

2 / 2L 2 \rj 

and hence if r become indefinitely great, the central ordinate of the middle group 
of three, with areas te 0 , tCj, to, and common base c, is given by 


Verify by finding approximately the central ordinate of the normal curve from 
the areas between -0 3 and -0 1, -U 1 and (0 1, M) 1 and 4 0 3 jc{o. 

24.7. From the following (abbreviated) entries m the y 1 table, v 9 (n' -10), 
estimate the value of y l foi which P 0 25 : 


X l - 

1 

r. 

11 

to 2757 

12 

0 2133 

13 

0 1626 

1 


24.8. The next table shows a frequency-distribution of 1000 observations, 
and also gives the frequencies summed from the top. Estimate (1) the median, 
(2) the first decile, (3) the ninth decile, (a) as usual by simple interpolation, 
{b) by bringing second differences also into account. 



INTERPOLATION AND GRADUATION. 


493 





Sum of 

Interval. 

Frequency. 


Frequencies 
from 0 to x. 

0-1 

28 

i 

28 

1-2 

76 

2 

104 

2-3 

114 

3 

218 

3-t 

141 

4 

359 

4-5 

158 

5 

517 

5-6 

142 

6 

659 

6-7 

119 

7 

778 

7-8 

95 

8 

873 

8-0 

63 

9 

' 936 

9-10 

33 

10 

1 969 

10-11 

18 

11 

987 

1J -12 

8 : 

! 12 

995 s 

12 i:i 

2 1 

! 1:1 

997 

13-14 

j 2 

! 14 

999 1 

I 4-15 


15 

999 1 

15-16 

l 

! 16 

1000 ( 

Total 

1 1000 

1 

; 

- 


21.1). Tlir following arc the mean temperatures (Fahrenheit) at Greenwich 
on three da>s 30 days apart round the periods of summer maximum and winter 
minimum. Estimate the approximate dates and values of the maximum and 
minimum. 


1 la’s . 

Date 

• I 

Temp 

Date. 

Temp. 

0 

1 1 

1 15th .June 

58 8 

16th Dec. 

40 7 

30 

15th ,1 uly 

63 4 

15th .l.m 

38 1 

60 

14th Aug. 1 

i 1 

62 5 

14th Fob. 

1 

39 3 


21*.10. Taking the \alue of the central ordinate of the normal curve from 
Appendix Table 1, estimate the area between the limits iOla/cr, and verify 
your answer from the area table. 







REFERENCES. 


Since the publication of the Jirst edition of this hook the literature of 
Statistics has grown to such an extent that considerations of space alone 
would prohibit the inclusion of a complete Bibliography in the present 
edition. Fortunately, there now appear, from time to time, two reviews 
of recent advances in Theoretical Statistics, one by J. O. Irwin and others 
in the Journal of the Royal Statistical Society , the other by P. R. Rider 
in the Journal of the American Statistical Association . Roth these reviews 
conclude with lists ol‘references. 

In the following lists we have, therefore, attempted to give references 
to more important Papers published prior to 1932 on subjects mentioned 
in the text. Some* later Papers of special interest, and recent books, have 
also been included. For subsequent years the student is referred to the 
reviews by Irwin and Rider mentioned above. 

The references are arranged m tin- following manner : First are gi\en 
works of general interest on the Theory ot Statistics, Probability and 
related subjects. Then the chapters o( the hook are dealt with ^ seriatim, 
(This involves certain Papers appearing more than once in the references.) 
Next come references to certain tables which facilitate calculation, and 
to tables of functions useful m statistical work. Finally some references 
an* given to Italian statistical literature. 

Most of the works cited are to be found in the library of the Royal 
Statistical Society. 

Books on the Theory of Probability. 

The student who wishes to proceed to the more advanced theory of 
statistics will iii k i it necessary to have a good working knowledge of the 
theory of probability, which lies at the root of most statistical inference 
from samples. A eomprehensn e bibliography of the earlier writings on 
the subject is gi\en in J. M. Kevins' book, No. (8), below. 

(1) Bacuj.lH' it, L., Valent dcs probabihtes, tonic I ; CantIner-Villars, Paris, 1912. 

(2) Baciieliek, L., Lc jeu, la chain c, vt le husaul ; Flammarion, Pans, 1914. 

(3) BnmtANn, J. L. F., Valent ties probabdites; Gauthier-Yillars. Paris, 1889. 

(4) Bruns, H., Wahrsehenihchkcitsiechnung and Kollektivmasslehre ; Tcubner, 

Leipzig, 1903. 

(5) Burnside, W., Theory of Probability ; Cambridge University Press, 1928. 

(0) Hunky, A., Calculus and Piobabdily Jor Actuarial Students; V, & K. Layton, 

London, 1922. 

(7) Jeffreys, 1L, Scientific Inference; Cambridge University Piess, 19,*11. 

(8) Keynes, J. M., A Treatise on Probability; Macmillan, London, 1921. 

(9) Levy, 11., and L. Korn, Elements of Probability ; Oxford, The Clarendon Press, 

1930. 

(10) Misks, R. von, Wainschetnhchkcit, Statistik and Wahrheit; Springer, Berlin, 

1928. 

(11) Poincare, II., Calcul des probabilities; Gauthier-Vi liars, Paris, 1890. 

(12) Venn, J., The Logic oj Chance : an Essay on the Foundations and Province of the 

Theory of Probability , nith especial reference to its Logical Bearings and its 

Application to Moral and Social Science and to Statistics ; Macmillan, London, 

1888. (Out of print.) 


495 



496 


THEORY OF STATISTICS. 


Books on the Theory of Statistics and Combination of 
Observations. 

(IS) Ander.son, O., Einfuhrung in die mathemutische Statistik ; Wien, Julius Springer, 
1035. 

(14) Bbown, W., and G. H. Thomson, The Essentials of Mental Measurement , 3rd Kd.; 

Cambridge University Press, 1025. 

(15) Biujnt, David, The Combination of Observations, 2nd Ed.; Cambridge University 

Press, 1931. 

(16) Czuber, E., WahrscheinUchkeitsrechnung and ihre Amvendung auf F elder ausglei- 

chung , Statistik and Lebensvcrsicherang ; Teubner, Leipzig, vol. 1, 4th Ed., 1923; 
vol. 2, 3rd Ed., 1921. 

(17) Czuber, E., Die statistiehe Forschungsmethode ; L. W. Seidel, Wien, 1921. 

(18) Darmois, G., Stahstique rnathematiqne; Pans, Uibraire Octave Doin, 1028. 

(10) Elderton, W. Palin, Frequency-curves and Con elation , 2nd Ed.; London, 
C. & E. Layton, 1027. 

(20) Ezekiel, Mordecai, Methods of Correlation Analysis; John Wiley & Sons, New 

York; Chapman A Hall, London, 1030. (Full treatment of methods of com¬ 
putation, especially the methods that have been developed by American writers 
for handling problems with many variables.) 

(21) Fisher, Arne, The Mathematical Theory of Probabilities and its Application to 

Frequency-curves and Statistical Methods , vol. 1 ; New Yoik (Macmillan), 1015; 
2nd Ed., Enlarged, 1022. 

(22) Eorcher, Hugo, Die statishsche Mcthode als selbslandigc Wissetisehaft; Leipzig. 

1013 (Veit). 

(23) Jordan, Charles, Statistique rnathematiqne; Gauthier-Villars, Pans, 1927. 

(24) Kuhn, Stanislav, Zdklady Teone Statistirkc Mctody (Elements of the Thcoiy of 

Statistical Method ), published bv the State Statistical OHice of the Czechoslovak 
Republic, Prague, 1929. (A solid work of 483 pp.; detailed bibliographies.) 

(25) Lexis, W., Abhandlungen zar Theorie der lievblkerungs undei Moialstahshh: 

Fischer, Jena, 1903. 

(26) Monti.ssun i)i- Rvllorl, K. ol, Probabilities et Statistiqucs; Hermann et Cie, 

Paris, 1031. (Applications of the binomial senes to the lilting of frequency - 
distributions.) 

(27) Step flnsln, J. E., Some Recent Reseaiehes in the Tlnory of Statistics and Actual tat 

Science ; Cambridge Uni\crsitv Press, 1030. (The substance of three lectures 
delivered in London.) 

(28) Tsc huprow, A. A., UrundbegriJJe anil (irundproblt me der Konefationstheonc; 

Teubner, Leipzig, 1025. 

(29) Whittaker, E. T., and G. Robinson, The Calculus oj Observations; Bluekie 

& Son, London, 2nd Kd., 1032. 

Books on Statistical Method. 

In certain eases the loregoing references also deal with stilt istieal 
method. See particularly references (17) and (‘JO). 

During recent years interest m statistical method has been evidenced 
by the issue of a rapidly increasing number of books on the subject. 
Those in the following list will be found useful as supplementing the 
present volume : - 

(30) Day, Edmund E., Statistical Analysis; The Macmillan to., New York, 1025. 

(31) Fisher, R. A., Statistical Methods for Research Workers; Oliver & Boyd, Edin¬ 

burgh and London, 6th Kd., 1036. 

(32) Kelley, Truman L., Statistical Method; The Macmillan Co., New York, 1923. 

(33) Mises, R. von, WahrscheinUchkeitsrechnung und die Amvendung in der Statistik 

und theoretische Physik ; Deuticke, When, 1931. 

(34) Njukeoro, A., La Mcthode staiistique; Marcel Giard, Paris, 1925. 

(35) Pearson, E. S., The Applications of Statistical Methods to Industrial Standardisa¬ 

tion and Control; British Standards Institution, 1936. 

(36) Riktz, H. L., Mathematical Statistics; Open Court Publishing Co., Chicago, 1927. 

(A small work, one of a series intended for those who have some mathematical 
knowledge but are not specialists. Useful references.) 



REFERENCES. 497 

( 87 ) Ritctz, H. L. (edited by), Handbook of Mathematical Statistics; Houghton Mifflin 

Co., Boston, 1924. 

(88 ) Shewiiakt, W. A., The Economic Control of Quality of the Manufactured Product ; 

D. van Nostrand Co., New York, 1981 ; Macmillan, London. 

(89) Tippett, L. 11. 'The Methods of Statistics; Williams & Norgate, Ltd., London, 

1981. (Useful to the student already possessing some knowledge but who 
wants an introduction to the methods of R. A. Fisher, analysis of variance, etc. 
Illustrations mainly biological.) 

(40) Westhroaard, II., and II. C. Nybolle, Grundzugc dcr Theorie dcr Statistik ; 

Fischer, Jena, 1928. 

Vital Statistics. 

(41) Newsholme, Sir Arthur, The Elements of Vital Statistics , Revised Edition; 

Allen & Unwin, London, 1928. 

(42) IYarl, R., Introduction to Medical Biometry and Statistics; W. B. Saunders Co., 

Philadelphia and London, 2nd Ed., Enlarged, 1980. 

(48) Whipple, G. C., Vital Statistics, 2nd Ed.; Wiley & Sons, New York; Chapman & 
Hall, London, 1928. 

(44) Woods, Hilda M., and W. T. Russell, An Introduction to Medical Statistics ; 
I\ S King & Son, Ltd., London. 1981. (Elementary introduction with reference 
to statistical methods in general.) 


Applications of Statistical Method to Engineering Problems. 

This is also a branch on which much work has been done of recent 
years, hut it is one with which we arc so wholly unfamiliar that we cannot 
undertake to give any detailed bibliography. The following books may 
be found useful, and will give references ;— 

(45) Becker, R., H. Plaut and 1. Bunue, Amcendyns’en dcr malhemnlischen Statistik; 
auj Problcme dcr Mussenfabrikuhon ; Julius Springer, Berlin, 1927. (Reprint, 
1980.) 

(40) Fry, T. C., Probability and its Engineering l T ses ; London, Macmillan <fe Co.; 
New York, I). \an Nostrand Co., 1928. 

(47) Kohlwi ilkr, Emil, Statistik im Dienste der Trehnik; 1L Oldcnbourg, Munchen 
und Berlin, 1981. 

The ‘‘Reprints” of the Bell Telephone Laboratories, Incorporated, New r 
York, include a number coming under the present head. Mention may 
be made m particular of Reprint B 297 (reprinted from the Journal of the 
Franklin Institute , vol. 205, 1928): 44 Economic Aspects of Engineering Applica¬ 
tions of Statistical Methods,” bv W. A. Shew hart, with a bibliography* 

See also the series of Supplements to the Journal of the lioyal Statistical 
Society (Industrial and Agricultural Research Section). 


Applications of Statistical Method to Agricultural Experiment. 

The literature on tins subject is enormous. For the general principles 
of the technique developed in recent years, see — 

(48) Wisiiaut, J., and H. G. Sanders, Principles and Practice of Field Experimentation ; 
Empire Cotton Growing Corporation, London, 1985. 

Reference may also he made to R. A. Fisher's book, ref. (81) above, and bis 
article on “The Arrangement of Field Experiments” in the Journal of the 
Ministry of Agriculture, vol. 88, 1926-27, p. 508. 

See also the series of Supplements to the Journal of the Royal Statistical 
Society (Industrial and Agricultural Research Section). 


82 



498 


THEORY OF STATISTICS. 


INTRODUCTION. 

The History of the Words “ Statistics,” “ Statistical.” 

(49) John, V., Der Xann Statishk; Weiss, Berne, 1883. A translation in Jour. Hoy. 

Slat. Sac. for same year. 

(50) Yule, G. U., “The Introduction of the Words ‘Statistics,’ ‘Statistical,’ into the 

English Language,” Jour. Iioy. Slat. Sue., \ol. 08, 1905, p. 391. 


The History of Statistics in General. 

Several works on theory of statistics include short histories, c.g. H. Wester- 
gaaid's Die (linn(huge der Thame der Statistih (Fischer, Jena, 1890), and P. A. 
Meitzen’s Gesetnehh , Theorie and Teehuik der Statist ih (new ed„ 1903: American 
translation by B. 1*. Falkner, 1891). r rhere is no detailed history in English, 
but the article “Statistics” in the Eneyelopcvdia BriUmniea (11th ed.) gives a 
very slight sketch, and the biographical articles in Pa Ig rave’s Dictionary of 
Political Economy arc useful. Reference may also be made to 

(51) Gabaglto, Antonio, Teoria generate della statistica , 2 vols. ; Iloepli, Milano, 

2nd Ed., 1888. (Yol. J, Parte slonca.) 

(52) Hotelling, II., “British Statistics and Statisticians Today,” Jour. Amer. Slat. 

Assoc., vol. 25. 1930, p. 180. 

(53) H ull, C. IT., Tin Eionomir If tilings of Sir Uiltiain Petty , together with the Observa¬ 

tions on the Bills oj Mortality more piohnhty by Captain John (Jraunt ; Cambridge 
Univeisitv Press, 2 vols., 1899. 

(54) John, V., Gcschiihte der Statistih , l u Ted. bis auf Quetolet; Knke, Stuttgart, 

1884. (All published; the author died m 1900. By tar the best history of 
statistics down to the carlv \ears of the nineteenth centiiiy.) 

(55) Kokin, John. The IIistoiy of Statistics, their Progress and Dci'elopment in Many 

('ountrics; Maennllan ('<». (New York), 1918. 

(50) Mohl, Homan von, Gesihichte and Littiratur dci Staalsveisscnschaften , 8 vols.; 
Knke, Kilangen, 1855 58. (For history of statistics sec principally latter half 
of v ol. 3.) 

(57) Walker, Ilri.i.N M., Shtdns m the Ihstoiy of Statistical Method , Baltimore, 
Williams A Wilkins Co., 1929. (Most detailed on recent history: chapteis on 
the Normal Curve, Moments, Percentiles, Correlation, Spearman's Theory of 
Two Factors for Intelligence, Statistics as a Subject of Instruction in American 
Universities, and the Origin of certain Technical Terms. Useful bibliographies.) 
(57 a) Wi stlugaard, H., Conti ifmtions to the History of Statistics , P. S. King & Sons, 
1932. 


History of Theory of Statistics. 

Somewhat slight information is given in the general works cited. 
From the purely mathematical side* the following are important: - 

(58) Peahson, Kahl, “ llistoncal Note on the Origin of the Normal Curve of Errors,” 

Biomctrika , vol. 11, 1921, p. 102. 

(59) Pearson, Karl, “Notes on the History of Correlation,” Biomctrika , vol. 13, 

1920, p. 25. 

(60) Pearson, Karl, “The Contribution of Giovanni Plana to the Normal Bivariate 

Frequency Surface,” Bmmetnka , vol. 20A, 1928. p. 295. 

(Gl) Pearson, 1< mil, “James Bernouilli’s Theorem,” Biomctrika , vol. 17, 1925, p. 201. 

(62) Pearson, Karl, “Historical Note on the Distributions of Standard Deviations of 

Samples,” Biometnha. vol. 23, 1931, p. 416. 

(63) Todhitnj eh, T., A History of the Mathematical Theory of Probability from the time 

of Pascal to that of Laplace', Macmillan, 1865. 

See also Karl Pearson, The Life, Letteis and Labours of Francis Gallon, vol. 2, 
Chapter 13; Cambridge Univeisitv Press, 1935; and vol, 3 a, Chapter 14. 

A classified survey of the statistical woik of the late Karl Pearson will be 
found m the Obituary by G. Udny YuJe: “Obituary Notices of Fellows of the 
ltoyal Society,” No. 5, December 1936. 



REFERENCES. 


499 


History of Official Statistics. 

(64) Berttllon, J., Cours ( ! Umentain • dc statist tque; SociYtc d'edilions scientifiques, 
1895. (Gives un exceedingly useful outline of the history of o(Uci;il statistics 
in different countries.) See also (55). 

CHAPTER 1. Theory of Attributes Notation and Terminology. 

(05) Jevons, W, Stanli.\ , “On a General Sysieni of Nuineneally Definite Reasoning,” 
Menwns of the Manchester Lit. and Phil. Soe., 3 870. Reprinted in Pure Logic 
and other Minor Works '. Macmillan, 1890. 

(00) Yule, G. IT., “On the Association of Attributes m Statistics, etc.,*’ Phil. Trans. 
Hoy. Soe., Series A, vol. 191, 1900, p. 257. 

(07) Ym.r, G. IT., “On the Theory of Consistence of .Logical ('lass-frequencies and its 
Geometrical Representation,” Phil. Trans. Hoy. Soe.. Series A, vol. 197, 1901, 
p. 91. 

(08) Yum., G. I 1 ., “Notes on the Theory of Association of Attributes in Statistics,” 
Biometrika , vol. 2, 1900, p. 121. (The lirst three sections of (08) arc an abstract 
of (00) and (07). The remarks made as regards the tabulation of class-fre¬ 
quencies at the end of (00) should be read in connection with the remarks made 
at the beginning of (07) and in this chapter* <f. footnote on p. 94 of (67).) 

Material has been cited from, and reference made to the notation used in - 

(09) Warner, b\. and Others, “Report on the Scientific Study of the Mental and 
Physical Conditions of Childhood”; published by the Committee, Parkes 
Museum, 1895. 

(70) Warner, F., “Mental and Physical Conditions among Fifty Thousand Children, 
etc.,” Join. Hoy. Stat. Sac., vol. 59, 1890, p. 125. 


CHAPTER 2. Consistence of Data. 

(71) Room , G., Laws of Thought. 1854 (chapter 19, “Of Statistical Conditions”). 

(72) Mow;\n, A. uk, Hoi and Logic. 1817 (chapter 8. “On the Numerically Definite 

Syllogism”). 

Refs. (71) and (72), together with (05), arc the* classical works with lespcel 
to the general theory of numerical consist cnee. The student will find the two 
above difficult to follow on account of then special notation, and, in the ease of 
Boole’s work, the special method employed. 

(78) Yum:, G. l\, “On the Theory of Consistence of Logical ( lass-frequencies and its 
Geometrical Representation.” Phil. Turns.. Senes A, vol. 197,1903. p. 91. (Deals 
at length with the theor\ of consistence lor an\ numbei of attributes, using the 
notation of the* present chapters.) 

CHAPTER iJ. Association of Attributes. 

(74) (in ken wood, M., and (J. I'. Yum, “The Statistics of Anti-typhoid and Anti- 

cholera Inoculations, and the Interpretation of Such Statistics m General,” 
Proc. Hoy. Soc. of Medicine. vol. 8, 1915, p. 113. (Cited for the discussion of 
association coefficients in $4, and the conclusion that none of these coefficients 
is of much value for comparative purposes in interpreting statistics of the type 
considered.) 

(75) Lipps, G. F., “Die Bestimmung der Abhungigkcit zwisehen den Merknialcn eines 

Gegenstandes,” Berichte d. math.-phys. Klassc d. kgl. sachsischcn Gesellschaft d. 
Wissenschaftcn; Leipzig, Feb. 3 905. (Deals with the general theory of the 
dependence between two characters, however classified; the coefficient of 
association of 3.15 is suggested independently.) 

(76) Peak son, Haul, “On the Correlation of Characters not Quantitatively Measur¬ 

able,” Phil. Trans. Hoy. Soc., Series A, vol. 195, 1900, p. 1. 

(77) Pearson, Karl, and David Heron, “On Theories of Association,” Biometrika , 

vol. 9, 1913, pp. 3 59-832. (A reply to criticisms in ref. (80).) 

(78) Yule, G. IT., “On the Association of Attributes in Statistics,” Phil. Turns. Hoy. 

Soc., Series A, vol. 194,1900, p. 257. (Deals fully with the theory of association; 

the association coefficient of 3.15 suggested.) 



500 THEORY OF STATISTICS. 

(79) Yule, G. U., “Notes on the Theory of Association of Attributes in Statistics,** 

Biornetrika , vol. 2, 1903, p. 121. (Contains an abstract of the principal portions 
of (78) and other matter.) 

(80) Yum:, G. I'., “On the Methods of Measuring the Association between Two Attri¬ 

butes,” Jour, Boy. Stat. Soe., vol. 75, 1912, pp. 579 042. (A critical survey of 
the various coefficients that have been suggested for measuring association and 
their properties.) 


CHAPTER 4. Partial Association. 

(81) \ ull, G. C., “On the Association of Attributes in Statistics.” PtnL Tunis. Boy. 

Soi., Senes A, vol. 194, 1900, p. 257. (I)cals fully with the theoiy ol‘ partial as 
well as of total association, with numerous illustrations: a notation suggested 
for the partial coefficients.) 

(82) Yolk, G. I 1 ., “Notes on the Theoiv <>f Association of Attributes in Statistics,” 

Biornetrika , vol. 2, 1909, p. 121. (Cf. especially fc§4 and 5 on the theory of 
complete independence, and the fallacies due 1o mixing of records.) 


CHAPTER 5. Manifold Classification. 

Contingency. 

(88) Kiprs, G. F., “Die Hestunmung der Abhangigkeit zvviseben den Merkinalen ernes 
Gegenslandes,” Berichte dei malh.-phys. K fosse der kyl. saehsi.sehen (iesells<huff 
dei H i,ssen.se/ioften , Lcip/.ig, 1905. (A geneial discussion of the problems of 
association and contingency.) 

(84) Pkarson, Karl, “On the Theory of Contingency and its Relation to Association 

and Normal C orielation,” Drupeis ’ Company Beseareh Memoirs, Biometric 
Series /; Dulnu & Co., Goiidnn, 1904. (The memoir m whi< h the coefficient of 
contingency is proposed.) 

(85) Pearson, Karl, “On a Coefficient of Class Heterogeneity or Divergence,” 

Biornetrika , vol. 5, 1900, p. 198. (An application of the contingency coefficient 
to the measurement of heterogeneity, e.y. m different districts of a country, by 
treating the observed frequencies of some quality.lj, A 2 , . . . si \ iii the different 
districts as rows of a contingency table and working out the coefficient: the 
Mime pnneiple is also applicable to the comparison of a single district with the 
rest of the country.) 

(80) Pi arson, Karl, “On the Measurement of the Influence of Broad Categories on 
Correlation,” Biornetrika , vol. 9, 1918. p. 110. 

(87) Pearson, K \rl, “Oil the General Theory of Multiple Contingency, w T ith Special 

Reference to Partial Contingency,” Biomeirika , vol. 11, 1915 .17, p. 145. 

(88) Pi. arson , Karl, and J. F. Tocher, “On Criteria for the Existence of Differential 

Death-rates,” Biornetrika, vol. 11, 1910, p. 159. 

(89) Pkarson, Karl, and K. S. Pearson, "On Polyehorie Coefficients of Correlation," 

Biornetrika , vol. 1 I, 1922, p. 127. 

(90) Ritc'HTK-Scott, A., “The Correlation Coefficient of a Polyehorie Table," Bio- 

metrika , vol. 12, 1918, p. 98. (Considers vaiious methods of measuring 
association with special reference li> 4 / 3-fold classifications.) 

(91) Roykr, E. B., “A Simple Method for Calculating Mean Square Contingency,” 

Annals Math. Stats., vol. 4, 1933, p. 75. 


Isotropy. 

(92) Yvll, G. U., “On a Property winch Holds Good for All Groupings of a Normal 

Distribution of Frequency for Two \ ariables, with applications to the Study 
of Contingency Tables for the Inheritance of Unmeasured Qualities,” Proc. 
Boy. Soi ., Senes A, vol. 77, 1900, p. 324. (On the property of isotropy and 
some applications.) 

(93) Yule, G. t)., “On the Influence of Bias and of Personal Equation in Statistics 

of Ill-defined Qualities,” Jour. Anthrop. Inst., vol. 30, 1900, p. 325. (Includes 
an investigation as to the influence of bias and of personal equation in creating 
divergences from isotropy in contingency tables.) 




REFERENCES. 


501 


Contingency Tables of Two Rows Only. 

(94) Pearson, Karl, “On a New Method of Determining Correlation between a 

Measured C haraeter A and a Character B of which only the Percentage of 
Cases wherein B exceeds (or falls short of) a Given Intensity is recorded for 
each Grade of ABiometrika , vol. 7, 1909, p. 90. (Deals with a measure of 
dependence for a common type of table, c.g. a table showing the numbers of 
candidates w ho passed or failed at an examination, for each year of age. The 
table of such a type stands between the contingency tables for unmeasured 
characters and the correlation table (chap. 11) for variables. Pearson’s method 
is based on that adopted for the correlation table, and assumes a normal 
distribution of frequency (chap. 12) for B.) 

(95) Pearson, Karl, “On a New Method of Determining Correlation, when one 

Variable is given by Alternative and the other by Multiple Categories,” 
Biometrika, vol. 7, 1910, p. 248. (The similar problem for the ease in which 
the variable is replaced by an unmeasured quality.) 


CHAPTER C>. Frequency-Distributions. 

(90) Pearson, Karl, “Skew Variation m Homogeneous Material,” Phil. Trans. 
Boy. Sac Series A, \ol. 180, 1895, pp. 818 414. 

(97) Pearson, Karl, “Cloudiness: Note on a Novel Case of Frequency,” Proc. lloy. 

Sac., vol. 02, 1897, p. 287. 

(98) Pearson, Karl, “Supplement to a Memoir on Skew Variation,” Phil. Trans. 

Boy. iSo( ., Senes A, \ol. 197, 1901, pp. 41*0 459, and Second Supplement, 
voi. 200, 1910, p. 129. 

(99) Pareto, Vili rtj>o. Coins d m economic politique, 2 vols.; Lausanne, 1890-97. See 

especially tome 2, livrc 0, chapter 1, “La com be des revenus.” 

The first four memoirs abo\e an* mathematical memoirs on the theory 
of ideal frequency-curves, the first being the fundamental memoir, and the 
third and fourth supplementary. The elementary student may, however, 
refer to them with advantage, on account of the large collection of frequency- 
distributions which is given. VV it lion t attempting to follow the mathematics, 
he im»\ also note that each ol our rough empirical types mav be divided into 
several sub-Hpes, the theoretical division into types being made on diflerent 
grounds. 

The fifth work (99) is cited on account of the author's discussion of the 
distribution of wealth m a eommimitv, to which retcrencc was made m 6.22. 

A number of curious distributions will also be found in— 

(100) Ni< Ei'ORo, Alliildo, La misura della vita , Turin, Fratelh Bocca, 1928. 

In connection with Ihe remarks in 6.7 on the grouping of ages, reference 
may be made to the following in which a different conclusion is drawn as to 
the best grouping- 

(101) \ ovmj, Allyn A.. “A Discussion of Age Statistics," Census Bulletin 73, Bureau 

of the Census, Washington, U.S.A., 1901. 


CHAPTER 7 . Averages and Other Measures of Location. 

General. 

(102) Feciineu, G. T., “Ueber den Ausgangswertb der kleinsten Alnveicluingssumme, 
dessen Bcstimmuug, Vervvendung und Veraltgemeinenmg,” Abh. d. kel. 
sachsischen Gesellschaft d. \Y issemchajten, vol. 18 (also numbered 11 of the 
Abh. d. math.-phyx. Klusse) : Leipzig, 1878, p. 1. (The average defined as 
the origin from which the dispersion, measured in one way or another, is a 
minimum: geometric mean dealt with incidentally, pp. 18-10.) 

(108) Fkcuner, G. T., Kollektivmasslehre , lierausgogeben von G. F. Lipps; Engclrnann. 

Leipzig, 1897. (Posthumously published: deals with frequency-distributions, 
their forms, averages and measures of dispersion in general: includes much 
of the matter of (102).) 



502 THEORY OF STATISTICS. 

(104) Zizf.k, Franz, Die statist! schen Mittelwerthe ; Duncker imd Humblot, Leipzig, 

1008: English translation, Statistical Averages, translated with additional 
notes, etc., bv W. M. Persons; Holt. & Co., New York, 1913. (Non-matlie- 
nmtical, but useful to the economics student for references cited.) 

The Geometric Mean. 

(105) (’RAWHmn, G. K., “An Klementary Proof that the Arithmetic Mean of any 

number of Positive Quantities is Greater than the Geometric Mean,” Proc. 
Edin. Math . Sac., vol. 18, 1899 1900. 

(100) Edge wok? ii, F. Y., “On the Method of ascertaining a Change in the Value of 
Gold,” Jour . Roy. Slat. So<\ , vol. 40, 1883, p. 714. (Some criticism of the 
reasons assigned by Jcvons for the use of the geometric mean.) 

(107) GAlton, Francis, “The Geometric Mean in Vital and Social Statistics,” Proc. 

Jloy. Soc vol. 29, 1879. p. 305. 

(108) Jevons, W. Stam.i a, A Serious Fall m the \ alas of Hold ascertained and its 

Social Effects set joith; Stanford, London, 1803. Reprinted in Investigation si 
in Currency and Vinuntc ; Macmillan, London, 1884. (The geometric mean 
applied to the measurement of price changes.) 

(109) Jevons, W. S'iam.f.v, “On the Variation of Prices and the Value of the Currency 

since 1782,” Join. Roy. Stat. Soc., \ol. 28, 1805. Also reprinted in volume 
cited abo\e. 

(110) Kapuan, J. C.. Sheiv Frequency-curves in Biology and Statistics; Noordhoff, 

Groningen, and Win. Dawson, London. 1903. (Contains, amongst other forms, 
a generalisation of McAlister's law, see ref. (111).) 

(111) McALisira, Donald, “The Law of the Geometric Mean,” Pioc. Roy. Soc., vol. 29, 

1879, p. 307. (The law of frequency to which the use of the geometric mean 
would be appropriate.) 


The Mode. 

(112) Poooson, Author T., “Relation of the Mode, Median and Mean in Frequency 

Curves," Rionictriku, vol. 9, 1910-17, p. 1*29. (Gives a proof of the relation 
rioted m 7.27.) 

(113) Pearson, Kakl, “On the Modal Value of an Organ or Character,” Biometnka, 

vol. 1, 1902, p. 200. (A warning as to the inadequacy of mere inspection for 
determining the mode.) 

(114) Pearson, Karo, “Skew Variation in Homogeneous Material,” Phil. Trans. 

Roy. S <>(., Senes vol. 18(5, 1895, p. 313. (Definition ol mode, p. 345.) 

(115) Yi Li:, G. 1 “Notes on the History of Paupciism in England and Wales, etc.: 

Supplementary Note on the Determination of the Mode,” Join. Roy. Stat. 
S(»c, vol. 59, 1890, p. 313 (The note deals with elementary methods of 
approximately determining the mode: the one-thiid rule and one other.) 


Estimates of Population. 

(110) Waters, A. C., "X Method for estimating Mean Populations in the last Intel - 
consul Pound,” Join. Roy. Stat. Soc., vol. 01, 1901, p. 293. 

(117) VVaj i in, A. C., Estimates of Population : Supplement to Annual Report of the 
Rcgistrar-Gcnnal Joi England and Wales (Cd. 2018, 1907, p. cxvn). 

For the methods formerly used, see the Reports of the Registrar-General of 
England and Waits for 1907, pp. exxxii-cxxxiv, and for 1910, pp. xi-xii. 
Estimates are now based on statistics of births, deaths and migrations, (f. 
Snow, ref. (300), for a dilh rent method based on the symptoms of growth 
such as numbers of births or of houses. 


Index-numbers. 

These were incidentally referred to in 7.34. The general theory of 
index-numbers and the different met hods in which they may be formed 
are not considered in the present work. The student will find copious 
references to the literature in the following: - 



REFERENCES 


503 


(118) Bennett, T. L., “The Theory of Measurement of Changes in the Cost of Living,” 

Jour . Hoy. Slat. Soc ., vol. 88, 1920, p. 455. 

(119) Bowi.ky, A. L., “The Influence on the Precision of Index-numbers of the Correla¬ 

tion between the Prices of Commodities,” Jour. Hoy. SinL Soc., vol. 89, 1926, 
p. 300. 

(120) Uowllv, A. L., Prices and Wages in the Chilled Kingdom, 1914 20; Oxford, 1920 

(Clarendon Press). 

(121) Bowi.ky, A. L.. “The Measurement of Changes in Cost • laving,” Jour . Roy. 

Slat. Soc., vol. 82, 1919, p. 040. 

(122) Eiku'.worth, F. Y., “Reports of the Committee appointed for the purpose of 

investigating the best methods of ascertaining and measuring Variations in 
the Value of the Monetary Standard,” British Assot utUon Reports, 1887 (p. 247), 
1888 (p. 181), 1889 (p. 188), and 1890 (p. 185). 

(128) Enoicwoirin, F. Y., Article “ Index-numbeis” in Palgrave’s Dictionary of Political 
Economy, vol. 2; Macmillan, 1925. 

(124) Eihjeworth, F. S'., “The Plurality of Index-numbers,” Economic Journal , 

vol. 85, 1925, p. 879. 

(125) Eix.evvortii, F. Y., “The Element of Probability in Index-numbers,” Jour. 

Roy. Slat. Soc., vol. 88, 1925, p. 557. 

(120) Fisher, Iiiyjno, “The Best Form of Index-number,” Quart. Pub. Amcr. Slat. 
Assoc.. March 1921, p. 588. 

(127) Fisher, Ihvinc, 'Hie Making of Indr,r-numbers : Houghton Miflbn Co., Boston and 

New York, 1922. (Useful as a repertory of formulas with tests of the results 
given on certain American data; otherwise, if reviews in E<ononuc Journal, 
vol. 88, pp. 90 and 216, and Join. Ro\f. Stat. Soc., vol. 86, 1928, p. 121, and 
vol. 87, 1921, p. 89.) 

(128) Flux, A. VV., “The Measurement ol Price ( hanges,” Join. Roy. Slat. Soc., vol. 84, 

1921, p. 167. 

(129) Foi NiAiv, II., “Memorandum on tin* Construction of Index-numbers of Prices,” 

Board of Trade Report on It h oh sale and Hi fail Pines m the l tilled Kingdom, 
1908. 

(180) Gim, C., “Quohpies considerations an sujel de la construction des nombres 

indices des pnx, etc.,” Mction, vol. 1, 1921, p. 8. 

(181) Kmbbs, G. 11., “ Prices, Price-indexes, and ( osl of Living in Australia,” Common¬ 

wealth of Australia, Labom and Indnitrial Biamh. Rcpoit A o. !. 1912. 

(182) March, L., “Rapport snr les indices dc la situation ecunomitpic," Bulletin de 

I'/nslitut International dc Statistujuc, t. 21, 1921, pt. 2, p. 8. 

(188) March, L“Lcs modes de mcsuic du monvement geneial tics prix,” Metron, 
vol. 1, No. t, 1921. j). 40. 

(181) Mviihiiall, A., Money, Credit and ('omtncrcc. M.umillan, London, 1928. 

(185) Pi’, a so ns, \\ . M., “Fisher's Forinnki foi index-numbers,” Rcc. Econ. Statistics, 

vol. 8, 1921, p. 108. 

(186) Wood, Frances, “The Course of Real Wages m London, 1900-12,” Jour. Boy. 

Stat. Soc., vol. 77, 1918-1 t, p. 1. 

(187) Working Classes, Cost of Levine Covimtih i , 1918, Rcpoit (C d. 8980, 1918), 

H.M. Stationery Office. 

For the student of the cost of living in (heat Britain the following are 
useful;— 

(188) “Labour Gazette Index-number: Scope and Method of Compilation,” Lab. Gaz. t 

March 1920 and Feb. 1921. 

(189) “Final Report on the Cost of Living of the Parliamentary Committee of the 

Trades Union Congress” (The Committee, 82 Eecleston S<p, Loudon, 1921); 
critical notices of the same m the Labour Gazette, Aug. and Sepl. 1921; and 
review' by A. L. Rowley, Econ. Jour., Sept. 1921. 

CHAPTER 8. Measures of Dispersion. 

General. 

(140) Feciinkr, G. T., “Ueber den Ausgangswerth dcr kleinsten Vbweielmngssummc, 
dessen Restimniung, Verwcndung uud Yerallgemeinerung,” ibh. d. kgl. sachs. 
Gcs.d. U'isscnschafttn, vol. 18 (also numbeied vol. 11 of the </. maih.-phys. 
Klattse); Leipzig, 1878, p. 1. 



504 THEORY OF STATISTICS. 

Standard Deviation. 

(141) Pearson, Karl, ‘‘Contributions to the Mathematical Theory of Evolution 

(i. On the Dissection of Asymmetrical Frequency-curves),” Phil . Trims. Roy. 
Soc.. Series /V. \ol. 185,1891, p. 71. (Intuxluetion of the term “standard 
deviation,” p. 80.) 

Mean Deviation. 

(142) Laplace, Pierre Simon, Marquis de, Theoric analyiique des 'probabilities'. 2 m * 

supplement. 1818. (Proof that the mean deviation is a minimum when taken 
about the median.) 

(143) Traciitlnkliu., M. I., “A Note on a Property of the Median,” Jour. Roy. Stat. 

Soc., vol. 78, 1915, p. 151. (A \erv simple proof of the same property.) 

Method of Percentiles, including Quartiles, etc. 

(144) G Alton, Francis, “‘Statistics by Intercompanson, with Remarks on 1 he Law of 

Frequency of Error,” Phil. Mail., vol. 19 (4*1 li Series), 1875, pp. 33 4(1. 

.(145) Gallon, Francis, Sutural Jnherilunir; Macmillan, 1889. (The method of 
percentiles is used throughout, with the quartile deviation as the measure of 
dispeision.) 

Relative Dispersion. 

(140) Pearson, Karl, “Regression, Heredity and Panmixia,” Phil. Trans. Roy. Sor ., 
Series A, \ol. 187, 1890, p. 253. (Introduction of “coefficient of variation,” 
pp. 270 277.) 

(117) Vriisoii m'Pfet.t, E., “Ueber giaduelle Vanahihtat von pflanzlichen Eigen- 
. sehaften,” Per. (tentsch. bot. Ges.. Ikl. 12, 1891, pp. 350-355. 


Calculation of Mean, Standard Deviation, or of the General 
Moments of a Grouped Distribution. 

We have given n direct method that seems the simplest and best for 
the elementary student. A process of successive summation that has 
some advantages can, however, be used instead. The student an ill lind 
a convenient description with illustrations in— 

(148) Eldkrton, W. Palin, Frequency-curves and Correlation; C. & E. Layton, London, 

2nd Ed., 1927. 

Effect of Grouping Observations. 

(149) Baten, W. R., “Corrections for Moments of a Frequency-distribution in Two 

Variables,” Ann. Math. Stats. , vol. 2, 1931, p. 309. 

(150) Kloerton, VV. Palin, “Adjustments for the Moments of J-shaped Curves,” 

Ihometrika , vol. 25, 1933, p. 179; followed by Karl Pearson, “Note on Mr 
Palm EldertoiTs Corrections to the Moments of J-eurves,” ibid., p. 180. 

(151) Mauiin, K. S., “On the Correction for the Moment Coefficients of Frequency- 

distributions when the Start of the Frequency is one of the Characteristics to 
he Determined,” Ihometrika , vol. 20, 1934, p. 12. 

(152) Paihman, Eleanor, and Karl Pearson, “On C orrections for the Moment Co* 

eflieients of Limited Range Frequency-distributions when there are Finite or 
Infinite Ordinates and any Slopes at the Terminals of the Range,” Biometrxka , 
vol. 12, 1918 19, p. 231. 

(153) Pi. a use, G. E., “On Corrections for the Moment Coeflieients of Frequcnev-dis- 

tnbutions when there are Infinite Ordinates at One or Both Terminals of the 
Range,” Ihometrika , \ol. 20A, 1928, p. 314. 

(154) Pearson, Karl, and Others (editorial), “On an Elementary Proof of Sheppard's 

Formula' for Coned mg Raw Moments, and on other allied points,” Ihometrika , 
vol. 3, 1904, p. 308. 

(155) Pearson, Karl, “On the Influence of ‘Broad Categories' on Correlation,” 

Ihometrika , vol. 9, 1913, pp. 110-139. 



REFERENCES. 505 

(156) Sheppard, W. F., “On the Calculation of the Average Square, Cube, etc., of a 

large number of Magnitudes,” Jour. Roy. Stat. S 'or., vol. 00, 1897, p. 098. 

(157) Sheppard, W. F., “On the Calculation of the most probable Values of Frequency 

Constants for Data arranged according to Equidistant Divisions of a Scab*,” 
Proc . Load. Math. Sac vol. 29, p. 454. 

(158) Sheppard, W. F., “The Calculation of Moments of a Frequency-distribution,” 

Jiiomctrika , vol. 5, 1907, p. 450. 

Coefficient of Variation. 

See ref. (146) above, and 

(159) Wilson, G. S., and Others, “The Bacteriological Grading of Milk,” Special 

Report 206 of the Medical Research Council , 1905. 

CHAPTER 9. Moments and Measures of Skewness 
and Kurtosis. 

Moments. 

For the introduction of moments and related coefficients and their 
use in lilting curves to frequency -distributions, sec refs. (210), (217) and 
(21 H) of Chapter 10. 

For met In ids of calculation of moments, see— 

(160) Eldethon. \\ . Pu/iv Ftcqucncy-i ttrves and Corrtlahon; C. & E. Layton, London, 

2 nd Ed., 1927. 

For corrections to the moments, see refs. (149) (158) of Chapter 8. 


Skewness. 

See refs. (216), (217) and (218) of Chapter 10, and also— 

(161) Ilm delink, II.. and L. M. Solomons, “The Limits of a Measure of Skewness,” 
Ann. Math Slats., vol. 0. 19.42, p. 1 tl. 


Seminvariants. 

(162) Crake C. C., “On a Property of the Seminvariants of Thiele,” Ann. Math. Stats 
vol. 2, 1941, p. 15 L 

(164) Thieve, T. N., “Theorv of Observations” (English version icprmted in Ann. 
Math. Stats., vol. 2, 1941, p. 165). 

See also refs. (416). (421) and (514). 


CHAPTER 10. Three Important Theoretical Distributions— 
the Binomial, the Normal and the Poisson. 

(164) Aitken, A. C., “Some Applications of Generating Functions to Normal Fre¬ 
quency,” Quart . Jour. Math., vol. 2, 1941, [>. 140. 

(105) Bernoulli, «L, Ars conjectandi , opus posthunnun: Accedit tructatus de scrichus 
infinifis, et vpistola nail ice script a dc ludo pilae reticularis , 1714. (A German 
translation in Ostvvaid’s Klassikcr dn e.raktcnJYissenschoften , Nos. 107 and 108.) 

For the earlv classical memoirs on tlie normal curve or law of error by 
Laplace, Gauss and others, see Todhunter’s History, ref. (64). 

(166) Camp, B. H., “The Normal Hypothesis,” Jour, Amer. Stat. Assoc., vol. 26, March 

Supplement, 1941, pp. 222 226. 

(167) C'/Tiber, E., Wuhrsrhcmlidikritsreehnuns. j: Teuhner, Leipzig. (Deduction of 

Law of Errors.) 

(168) Edoeworth, F. Y., Art icle on the “Law of Error” in the Encyclopaedia Britannica , 

10th Ed., vol. 28, 1902, p. 280. 



506 THEORY OF STATISTICS. 

(109) Edgeworth, F. Y., “The Law of Error,'” Cambridge Phil. Trans., vol. 20. 1904, 
pp. .30-05. 113-141 (and an Appendix, pp. 1 14. not printed in the Cambridge 
Phil. Trans., but issued with Reprints). 

(170) Gajltok, Francis, Natural Inheritance; Macmillan A Co., London, 1889. (Mechani¬ 

cal method of forming a binomial or normal distribution, chap. 5, p. 03. For 
Pearson's generalised machine, see below, ref. (174).) 

(171) Gumbel, K. J., "La distnbu/ione dci decessi secondo la legge di Gauss,” Giorn. 

deW Jst. Ital. degli Alt., vol. 3, 1932, pp. 311 342. 

(172) Nixon, J. W„ “An Experimental Test of the Normal Law of Error,” Jour. 

It off. Stat. Soc.. vol. 70, 1913, pp. 702 700. 

(173) Pearson, Karo, “Historical Note on Hie Origin of the Normal Curve of Errors," 

Biometrika, vol. 1 L 1924, p. 402. 

(174) Pearson, Karl, “Skew Variation in Homogeneous Material,” Phil. Trans. 

Roy. Soc ., Series A, vol. ISO, 1895, p. 343. 

For the generalised binomial machine, see §1. The memoir deals with 
curves derived from the general binomial, and from a somewhat analogous 
series derived from the ease of sampling from limited material. Supplement 
to the memoir, ibid., vol. 197, 1901, p. 443. Second Supplement, ibid., vol. 
210, 1910, p. 129. For a derivation of the same cuivcs from a modified stand¬ 
point, ignoring the binomial and analogous distributions, (J. ref. (35 t). 

(175) Sheppard, YV. F., “On the Ypphcation of the Theory of Error to Cases of Noimal 

Distribution and Normal Correlation," Phil. Trans. Roy. Soc., Series A, vol. 
192. 1898, p. 101. (Include* a geometrical treat merit ot the normal curve.) 
(170) Yi lk. G. t\, "On the Disli i hilt ion of Deaths with Age when the Causes of Death 
act cumulatively, and similar Ficquoncv-distributions." , font. Roy. Stat. Soc., 
vol. 73, 1910, p. 20. (A binomial distribution with negative index, and the 
related curve, i.c. a special ease of one of Pearson's cuivcs, ref. (174).) 

Poisson’s Distribution. 

(177) Portedvvicz, L. von, Dus Gcsctz do htnncn Zahfcn; Tcubnci, Leipzig, 1898. 

(178) IJortkiewtcz, L. von, “ l eber die Zeitlolgi Zufalliger Ereigiussc,” Hull, dc 

I'Jnstitut hit. dc Stat., tonic 20. 2’ hwe, 1915. 

(179) Bold kiewicz, L. von, "Hcahsimis und Formahsmus in dcr matbcmaliscber 

Statistik,” Allgcmcm . Slat. Arch., vol. 9, 1910, p. 225. (Continues the dis¬ 
cussion initiated by the paper ot Miss IN hitaker, ref. (190).) 

(180) Gri knyvood M., and G. I’hnv Yiee, "On tlu* Statistical Interpretation of some 

Bacteriological Methods employed in Water Analysis,” Journal oj Hygiene , 
vol. 21, 1917, p. 30. (Applies a < nterion developed from Poisson's limit to 
the discrimination of water analyses; numerous arithmetical examples.) 

(181) Greenwood, M., and G. l ; . Yna . "An Enquiry into the Nature of Fiequency- 

distnbutions representative of Multiple Happenings, with paiticular icfcrcnce 
to the Occurrence of Multiple Attacks of Disease or of Repeated Accidents," 
Jour. Roy. Stat. Soc., vol. 83, 1920, p. 255. 

(182) Morant, G., “On Random Occurrences in Space and Time when followed by a 

Closed Interval.” Bio/nchika. vol. 13. 1921, p. 309. 

(183) Nevmioed, Ei n ee M.. "A Contribution to the Study of the Human Factor in 

the C ausation of Accidents," Industrial b'atiuue Research Hoard . Beport No. 34, 
1920. 

(184) NEwnor.D, Ethel M., “Practical Applications of the Statistics of Repeated 

Events, particularly to Industrial Accidents,” Jour. Roy. Stat. Soc., vol. 90, 
1927, p. 487. 

(185) Pois.non, S. 1)., Rcchcrchcs sur la piobabilife des fugements, etc.; Paris, 1837. 

(Pp. 205 207.) 

(180) Rutiteki oun, 1C., and IT. Geiger, w ith a note by II. Baieman, “The Probability 
Variations m the distribution of u-partieles,” Pint. Mag., Series 0, vol. 20, 
1910, p. 098. (The frequency of particles emitted during a small interval of 
tune follows the law of small chances: the law deduced by Bateman in ignorance 
of previous work.) 

(187) Soper, H. E„ "Tables of Poisson’s Exponential Binomial Limit,” Riometrika, 

vol. 10, 191 L pp. 25 35. 

(188) “Studi nt,” “On the Error of Counting with a Hemacytometer,” Jhometiika, 

vol. 5, 1907, p. 351. 



REFERENCES. 507 

(189) ‘‘Student,” “An Explanation of Deviations from Poisson's I,aw in Practice,” 

Biometrika , vol. 10, 1919, p. 211. 

(190) Whitaker, Lucy, “On Poisson’s Law of Small Numbers,” Biometrika , vol. 10, 

1914, pp. ilO-71. 


Frequency-distributions in General. 

(191) Baten, W. 1)., “Frequency Laws for the Sum of n Variables which are subject 

to Given Frequency Laws,” Met run, vol. 10, part 3, 1933, p. 75. 

(192) Camp, B. 1L, “Probability Integrals for the Point Binomial,” Biometrika, vol. 10, 

192 Y, p. 103. 

(193) ('amp, B. IL. “Probability Integrals for a Ilypergeometrictil Series,” Biometrika , 

vol. 17, 1925, p. 01. 

(194) Charmer, C. V. L., Numerous papeis issued from the Astronomical Department 

of Lund, 1900 12. especially “Contributions to the Mathematical Theory of 
Statistics” (1912). 

(195) Charmer, C. V. L., “Researches into the Theory of Probability” ( Communica¬ 

tions from the Astronomical Observatory , Lund); Lund, 1900. 

(190) Chaheier, C. V. L., “ V New Form of tfie Frequency Function,” Mcddelandc, 
Lunds Astronomiska Observatonum , 1928. 

(197) Cramer, IL, “On some Classes of Series list'd in Mathematical Statistics,” Den 

sjette SkandnwvisLc Matematikercongrcs, Copenhagen, 1928. 

(198) Cramer, IL, “On the Composition of Elementar\ Errors,” Skandinavish Aktuarie- 

tidskrift , 1928. 

(199) Cunningham, E., “Tlie o-Funetions, a Class of Noinuil Functions occurring in 

Statistics,” Dior. Boy. Soc., Sera's vol. 81, 1908, p. 310. 

(200) Dodd, 10. L., “The Frequency Laws of a Function of Variables with given 

Frequency Laws.” Umals oj Mathematics , vol. 27, 1925, p. 12. 

(201) Dodd, E. L., “The Ficqiiency Law of a Function of One Variable,” Bull. Amer, 

Math. Soc., vol. 31, 1925. 

(202) Dodd, E. L., “On Ordinaiv Plane and Skew Curves,'’ Bulletin of the Unir. of 

Terras. No. 222. 1912. 

(203) Dodd, K. L., ‘ Classification of Si/.es and Measures by Frequency Functions,” 

Jour. Amer. Stat. Assoc., vol. 20, 1931. p. 277. (A survey: useful references.) 
(20t) Edgeworth, F. V., “Oil the Matiu'inatieal Representation of Statistical Data,” 
Jour. Botf. Stat. Soc., vol. 79. 1910, p. 1-50, vol. 80, 1917, pp. 05, 200, 411; 
vol. 81, 1918, p. 322. 

(205) Kdoi*worth, F Y., “On the' Beprescntation of Statistics by Mathematical 
Formula 1 ,” Join. Botf. Stat. Soc.. vol. 01, 1898, p. 070, vol. 02, 1899, p. 125; 
vol. 03, 1900, p. 72. 

(200) Edgew or hi, F. Y., \iti< Icon the “Law of Error'* m the Kncyclopadia Bntannica, 
10th Ed., vol. 28, 1902, p. 280. 

(207) Kdckwoiitii, F. V., “The Law of Eiror,” ('amhiidg<' Phil. Trans., vol. 20, 1904, 

pj>. 30 05, 113 141 (and an Xppcndix, pp, 1 14, not printed in the Cambridge 
Phil. Trans., but issued with Reprints). 

(208) Kdgi< worth, F. V., “The Geneinliscd Law of Error, or Law of Great Numbers,” 

Jour. Boy. Stat. Soc., vol. 09, 1900, p. t97. 

(209) Edgeworth, F. Y., “On the Beprescntat ion of Statistical Frequency by a 

Curve,” Jour. Boy. Stat. Soc., vol. 70, 1907, p. 102. 

(210) Edgeworth, F. Y., “Cntned Methods of Representing Frequency,” Jour. Boy. 

Stat. Soc., vol. 87, 1924, p. 571. 

(211) Edgeworth, F. Y., “Mr Rhodes's Curve and the Method of Adjustment,” Jour. 

Boy. Stat. Sue., vol. 89, 1920, p. 129. (See ref. (221).) 

(212) Frisch, R., “On tlie Vse of Difference Equations m the Study of Frequency- 

distributions,” Mttrun , vol. 10, 1933, part 3, p. 35. 

(213) Geary, R. ('., “The Frequency-distribution of the Quotient of Two Normal 

Variables,'’ Jour. Boy. Stat. Site., vol, 93, 1930, p. 442. 

(214) Kapteyn, J. C., Skav Frequency-curves m Biology and Statistics; Noordhoff, 

Groningen; Wm. Dawson & Sons, London, 1903. 

(215) Nixon, J. YV., “An Experimental Test of the Normal Law of Error,” Jour. 

Boy. Stat. Soc., vol. 70, 1913, pp. 702 700. 

(216) Pearson, Karl, “Skew’ Variation in Homogeneous Material,” Phil. Trans. Boy. 

Soc. t Series A, vol, 180, 1895, p. 343, and Supplement, vol. 197, 1901, p. 443. 



508 THEORY OF STATISTICS. 

(217) Pearson, Karl, “Das Fchlergesetz und seine Verallgemeinerungen durch 

Fechncr und Pearson: A Rejoinder,’* Biometrika , vol. 4, 1905, p. 169, 

(218) Pearson, Karl, “Second Supplement to a Memoir on Skew Variation,” Phil. 

Trans. Roy. Sor Senes A, vol, 216, 1916, p. 129. 

(239) Pearson, Karl, “Historical Note on the Origin of the Normal Curve of Errors,” 
Biometrika , vol. 14, 1924, p. 402. 

(220) Perozzo, Luhji, “ Nuove Applicazioni del Calcolo delle Probability alio Studio 

dei Fcnonieni Statistici e Distribuzione del Matrimoni secondo l’Kta degli 
Sposi,” Mem. della Classe di Scienze moral), etc.. Reale Acead. dei Lined , vol. 10, 
Series 3, 1882. 

(221) Rhodes, E. C., “On the Generalised Law of Error,” Jour. Roy. Stat. Soc., vol. 88, 

1925, p. 576. 

(222) Rjktz, H. L., “On certain Properties of Frequency-distributions obtained by a 

Linear Fractional Transformation of the Variates of a given Distribution,” 
Ann. Math. Slats., vol. 2, 1931, p. 38. 

(223) Romanovsky, V., “Generalisation of some Types of the Frequency-curves of 

Professor Pearson,” Biometnha, vol. 16, 1924, p. 106. 

(224) Soper, 11. E., Frequency Arrays; Cambridge University Press, 1922. 

(The abo\e are concerned with the general theory of frequency systems; 
the following deal with the forms which are suitable for the representation 
of particular classes of data, e.g. statistics of epidemic diseases, statistics of 
accidents, etc.) 

(225) Brownlee, J. “The Mathematical Theory of Random Migration and Epidemic 

Distribution,” Proe. Roy. Soc. Edm ., vol. 31, 1910 11, p. 262. 

(226) Brownli*e, .T., “Ceitam Aspects of the Theory of Epidemiology in Special 

Reference to Plague,” Proe , Roy. Soe. Methane , Setl. Epidemiology and State 
Medicine, vol. 101), 1918, p. 85. (The appendix to this paper summarises the 
author's results and those of Sir Ronald Ross; vide infra.) 

(227) Greenwood, M., and G. U. Yell, “An Enquiiy into the Nature of Frequency- 

distributions Representative of Multiple Happenings, with Particular Reference 
to the Occurrence ol Multifile Attacks of Disease or of Repeated Accidents,” 
Jour. Roy. Slat. Soc., vol. 83, 1920, p. 255. 

(228) Knihhs, G. H., “The Mathematical Theory of Population,” Appendix A to 

vol. 1 of Census of the Commomtealth of Australia. (Contains a full discus¬ 
sion ol the application of \anous frequency systems to vital statistics.) 

(229) Mom, II., “Mortality Graphs,” Ttans. Actual ml Soe. America , vol, 18, 1917, 

p. 311. (Numerous graphs of mortality rates in different classes and periods.) 

(230) Ross, Sir Ronald, “An Application of the Theory of Probabilities to the Study 

of <i priori Pathometry,” Proe. Roq. Soe., A, \ol. 92, 1916, p. 204. 

(231) Robs, Sir Ronald, and Hilda P. Hi nso\, “An Application of the Theory of 

Probabilities to the Study of a jiriori Pathometry,” Pts. 2 and 3, Proe. Roy. 
Soe., A, vol. 93, 1911, pp. 212 and 225. 


The Resolution of a Distribution compounded of Two 
Normal Curves into its Components. 

(232) Edgeworth, F. Y., “On the Representation of Statistics by Mathematical 

Formula*,” Pt. 2, Jour. Roy. Stat. Soc., vol. 62, 3 899, p. 125. 

(233) IIklguero, Fernando de, “ Per la risoluzionc delle curve dimorliehe,” Jhonietrika, 

vol. 4, 1905, p. 230. Also memoir under the same title in the Transactions of 
the Aecademia Reale dei Lined, Rome, vol. 6, 1906. (The first is a short 
note, the second the full memoir.) 

See also the memoir by Charlier, cited in (195), section 6 of that memoir 
dealing with the problem of dissection. 

(234) Pearson, Karl, “Contributions to the Mathematical Theory of Evolution” 

(on the dissection of asymmetrical frequency-curves), Phil. Trans. Roy. Soe., 
Series A, vol. 185, 1891, p. 71. 

(235) Pearson, Kart., “On some Applications of the Theory of Chance to Racial 

Differentiation,” Phil. Mag. t 6th Series, vol. 1*1901, p. 110. 



REFERENCES. 


509 


CHAPTER 13. Correlation. 

The theory of correlation was first de\ eloped on definite assumptions 
as to the form of the distribution of frequency, the so-called “normal 
distribution” (Chap. 12) being assumed. Sir Francis Galton, in (242)- 
(244), developed the practical method, determining his coefficient (Gallon's 
function, as it was termed at first) graphically. Edgeworth developed the 
theoretical side further in (240), and Pearson introduced the product-sum 
formula in (240)—both memoirs being written on the assumption of a 
“normal” distribution of frequency (cf. Chap. 12). The method used in 
this chapter is based on (247) and (248). 

(22)0) Hatln, W, I)., “Comrtion for the Moments of a Frequency-distribution in Two 
Variables/’ Ann, Math . Stats., \ol. 2, 11)2)1, p. 2)00. 

(22)7) Rravais, A., “Analyse mathCnuitiquc sur les probabilites ties erreurs de situation 
d’un point , trad, des Sciences . Me moires prr scutes par divers savants, II e serie, 
t. 1), 1840, p. 25.“,. 

(238) Darbishtrl, A. I)., “Some Tables for illustrating Statist leal Cot relation,” l\Ieru. 

and Proc. of the Manchestu Lit. and Phi!. Soc. , vol. 51, 15)07. (Tables and 
diagrams illustrating the meaning of values of the eoirelation coefficient from 
0 to 1 by steps of a twelfth ) 

(239) Edglvvoiuh, F. \ ., “On a New Method of reducing Observations relating to 

Several Quantities,” Phil. Mag.. 5th Series, vol. 21, 1887, p. 222, and vol. 25, 
1888, p. 181. (A method of treating conflated variables ditienng entirely from 
that desenbed in tins chapter, and based on the use of the median: the method 
involves the use of trial and eiror to some extent. For some illustrations see 
F. Y. Edgeworth and A. L. Row lev, dour. Hoy. Stat. Soc., vol. 35, 1902, p. 2)4*1 
et sey.) 

(240) KnouwoKiH, F. Y., “On (oirelated Averages,” Phil. Mu*., 5th Series, vol. 34, 

1892, p. 190. 

(241) Frisch, Ragnar, “(orrelation and Scatter m Statistical Variables," Nordic 

Statistical Journal , vol. I, 1929, p. 30. 

(242) Gai/ion, Francis, “Regression towards Mediocrity in Hereditary Stature,” 

Jour. Anthrop. Inst., vol. 15, 1880, p. 21*0. 

(243) Galton, Francis, “Family Likeness in Stature,” Pun. Hoy. Soc., vol. 40, 1886, 

p. 42. 

(244) Galton, Fr \\c is, “Correlations and tbeii Measurement,” Proc. Hoy. Soc., vol. 45, 

1888, p. 135. 

(245) Pearson, K\iu>, “Notes on the llistorv of ( oirelation,” lhometrika, vol. 13, 

1920, p. 25. 

(246) Pearson, Karl, “Regression, Hcreditv and Panmixia,” Phil. Trans. Hoy. Soc., 

Series A, vol. 187, 1896, p. 253. 

(247) Yuli:, G, T\, “On the Significance of Rravais' Formula* lbr Regression, etc*,, in 

the ease oi Skew Correlation/' Proc. Hoy. Sm., vol. 60, 1897, p. 177. 

(248) Yule, G. lb, “On the Theory of Correlation/' Jour. Hoy. Stat. Sot., vol. 60, 1897, 

p. 812. 

CHAPTERS 12 ANL) 13. Normal Correlation and Further 
Theory of Correlation. 

General. 

(249) Rravais, A., “Analyse mathematic] ue sur les probabilites des erreurs de situation 

d’un point,” Acad, des Sciences : Mcmoires prcsniits par ttiros savants, IP sCrie, 
t. 9, 1846, p. 255. 

(250) Galton, Francis. “Family Likeness in Stature,” Proc. Hoy. Soc., vol. 40, 1886, 

p. 42. 

(251) Galton, Francis, Natural Inheritance ; Macmillan Ar Co., 1889. 

(252) Dickson, J. D. Hamilton, Appendix to (250), Proc. Hoy. Soc., vol. 40, 1886, 

p. G3. 



510 THEORY OF STATISTICS. 

(258) Edgeworth, F. Y., “On Correlated Averages,” Phil. Mag., 5th Series, vol. 84, 
1892, p. 190. 

(254) Pearson, Karl, “Regression, Heredity and Panmixia,” Phil. Trans. Hoy. Soc., 

Series A, vol. 187, 1898, p. 253. 

(255) Pearson, Karl, “On Lines and Planes of Closest Fit to Systems of Points in 

Space,” Phil. Mag., 6th Series, vol. 2, 1901, p. 559. (On the fitting of “ principal 

axes” and the corresponding planes m the ease of more than two variables.) 

(256) Pearson, Karl, “On the Influence of Natural Selection on the Variability and 

Correlation of Organs,” Phil. Trans. Hoy. Soc., Series A, vol. 200, 1902, p. 1. 
(Based on the assumption of normal correlation.) 

(257) Pearson, Karl, and Alic e I,le, “On the Generalised Probable Error in Multiple 

Normal Correlation," Biometrika , \ol. 6, 1908, p. 59. 

(258) Sheppard, VV. F., “On the Applic ation of the Theory of Error to Cases of Normal 

Distribution and Normal Correlation,” Phil. Trans. Hoy. Soc., Series A, vol. 192, 
1898, p. 301. 

(259) Sheppard, IV. F„ “On the Calculation of the Double-integral expressing Normal 

Correlation,” Cambridge Phil. Trans., vol. 19, 1900, p. 23. 

(200) Yttlk, G. l T ., “On the Theory of Correlation,” Join. Hoy. Slat. Soc., vol. 00, 1897, 

p. 812. 

(201/ Yttli , G. I “On the Theory of Correlation for Any Number of \ ambles treated 
by a New System of Notation,” Proc. Hoy. Soc., Scries A, vol. 79, 1907, p. 182. 


Applications to the Theory of Attributes, etc. 

(202) Pearson, Kart., “On the Correlation of Characters not Quantitatively Measur¬ 

able,” Phil. Trans. Roy. Soc., Stales A, vol. 195, 1900, p. 1. (Cf. criticism in 
ref. (80).) 

(203) Pearson, Karl, “On a New Method of Determining Correlation between a 

Measured Character J and a Character B of which only the Percentage oi 
Cases wherein B exceeds (or falls short of) a Given Intensity is recorded for each 
grade of .1,” Biometrika, \ol. 7, 1909, p. 90. 

(264) Pearson, Karl, “On a New Method of Determining Correlation, when one 
Variable is ghen by Alternative anti the other by Multiple Categories,” 
Biometrika, vol. 7, 1910, p. 218. 

See also the memoir (258) by Sheppard. 


Various Methods and their Relation to Normal Correlation. 

(205) Pearson, Karl, “On the Theory of Contingency and its Gelation to Association 
and Normal Correlation,” Diapers'’ Company Research Memoirs , Biometric 
Series /, Duiau & Co., London, 1901. 

(266) Pearson, Karl, “On Further Methods of Deteimining Correlation,” Drapers' 

Company Research Memoirs, Biometric Scries IV. (Methods based on correla¬ 
tion of ranks: difference methods). Duiau A Co., London, 1907, 

(267) Plarson, Karl, and Others (editorial), “Tables for Determining the Volumes of 

a Bivariate Normal Surface,” Biometrika, \ol. 22, 1930, p. 1. 

(268) Spearman, C., “A Footrule for Measuring Correlation,” Brit. Jour, of Psychology, 

vol. 2, 1906, p. 89, (The suggestion of a “rank” method: see Pearson's 
criticism and improved formula in (200), and Spearman's reply on some points 
in (269).) 

(269) Spearman, C., “Comdation Calculated from Faulty Data,” Brit. Jour, of 

Psychology , vol. 8, 1910, p. 271. 

(270) Thorndike, K. L., “Empirical Studies in the Theory of Measurement,” Archives 

of Psychology (New' York), 1907. 


Fit of Regression Lines. 

(271) Fisher. K. A., “The Goodness of Fit of Regression Formula*, and the Distribution 

of Regression Coefficients,” Jour. Hoy. Slat. Soc., vol. 85, 1922, p. 597. 

(272) Pearson, Karl, “On the Application of Goodness of Fit Tables to test Regression 

Curves and Theoretical Curves used to describe Observational or Experimental 
Data,” Biometrika , vol 11, 1916-17, p. 237. 



REFERENCES 


511 


Correlation in Case of Non-linear Regression. 

(273) Pearson, Karl, “On a General Method of Determining the Successive Terms in a 

Skew Recession Line/’ Biomctrika , vol. 13, 1921, p. 290. 

(274) Pearson, Karl, “On the Correction necessary for the Correlation Ratio ?7»” 

Biomctrika, voi. 8, 1911, p. 251, and vol. 11, 1928, p. 412. 

(275) Puetorius, S. J., “Skew Bivariate Frequency Surfaces, examined in the Light of 

Numerical Illustrations,” Biomctrika , vol. 22, 1980, p. 109. 

(27(5) Wicksell, S. 1)., “On Logarithmic Correlation, with an Application to the 
Distribution of Ages at First Marriage,” Mcddclatide fian Lunds Astronomiska 
Observatvrium. No. 81, 1917; Svonska Yhtuarieforenmgs Tidskrift. 

(277) Wjcksi ll, S. D., “The Correlation Function of Tvpe A,” Kungl. Svcnska Veten- 
skapsukademiens Hand!., Bd. 58, 1917. 

See also refs. (858) (855), (877), (878) and (879). 


CHAPTER 11. Partial Correlation. 

(278) Brown, .1. W., INI. Greevwoou, and Fuvnves Wood. “V Study of Index- 

correlations,” Jour. Hoy. Slat. Sac., vol. 77, 1914. pp. BIT 8Mi. (The partial 
or “solid” correlation ratio is used.) 

(279) Camp, Bra] on If., “Mutually Consistent Multifile Regression Surfaces,” 

Biomcti ika. vol. 17, 1925, p. 148. 

(280) Edlevvor i n, F. Y.. “On Corielated Averages,” Phil. Mag., 5th Series, vol. 81, 

1892, p. 191. 

(281) Ezlkii l, Moitma \i, “The Deteimmalion of Curvilinear Regression Surface* 

in the Presence of Other Variables,” dour. hntr. Stat. Assoc., vol. 21, 1929, 
p. 810. 

(282) Ezekiel, M., “The Application ol the Theory oj Error to Multiple and Curvilinear 

Correlation,” dout. A nut. Stat. issoc , vol 21, 1929, Supplement, p. 99. 

(288) II v ll. Pun ie, “Multiple and Partial Correlation Coefficients in I he ease of an 
a-Fold Variate Svstem,” liunmh ika, vol. 19, 1927, p. 100. 

(284) Doom u, R, II., and G. C. Vi 1.1 . “Note on Estimating the Relative TnHuenee of 

Two Variables upon a Third,” dum. Roy. Stat. Sac ., vol. 09, 1900, p. 197. 

(285) Hors j , I’., “A (General Method of Evaluating Multifile Regression Constants,” 

dour. A tun. Stat. Assoc., vol. 27, 1982, p. 270. 

(280) Isslrms. I.., “On the Partial Correlation Ratio. 19. I. Theoretical,” Biomctrika , 
vol. 10, 191 L pp. 891 111. 

(287) Issi mas, L., “On the Partial Correlation Ratio. Pt. II. Numerical,” Biomctrika , 

vol. 11, 1910 17. p. 50. 

(288) Killev. T. L., and F. S. SviNintv, “An Iteration Method for determining 

Multifile ('or relat ion ( onstants,” dour . inn r. Stat. . 1 s.soc.. vol. 21,1920, p 282. 

(289) Kelli. v, T. L., and Q. M< Ni m\ u, “Doolittle virsus the Kelley Salisbury Itera¬ 

tion Method foi Computing Multifile Regression Coefficients,” •lour. Amer. 
Stat. Assoc., vol. 21, 1929. p. Kit. 

(290) Pearson. Karl, “Regression, Heredity and Panmixia,” Phil. Trans. Roy. Soc. y 

Series A. vol. 187, 1890, p. 258. 

(291) Pevrson, K mil, “On the Partial (oi relat ion Ratio,” Ptoc. Roy . Nor., Series A, 

vol. 91, 1915, p. 192. 

(292) Rom vnovskv, V.. “Sidle Rogrcssione Multiple,” Gunn, dell' 1st. Hat. degli Attuari, 

anno 2, 1981. 

(293) Papuan, M., “On Partial Multiple Correlation Coefficients in a Universe of 

Manifold Characteristics,” Bwrnctiiku. vol. 19, 1927, p. 39. 

(Si9t) Thomson, G. H., “On the Conifiulation of Regression Equations, Partial Correla¬ 
tions, etc.,” Bril. dour. Psych., vol. 23, 1932, p. 0L 
(295) Tschuphovv, A. A., transl. by L. Kslhlis, “The Mathematical Theory of tlie 
Statistical Methods employed m the Study of (Correlation in the ease of Three 
Variables,” Trans, ('urnb. Phil. Soc., vol. 28, 1928, p. 887. 

(290) Yule, G. t \, “On the Significance of Brava is' Formula 4 foi Regression, etc., in the 
ease of Skew Correlation,” Proc. Roy. Soc., vol. 00, 1897, p. 177. 

(297) Yitli-;, G. U., “On the Theory of Correlation,” dour. Roy. Stat. Soc.. vol. 00, 1897, 

p. 812. 

(298) Yule, G. U., “On the Theory of Correlation for Any Number of Variables treated 

by a New System of Notation,” Pioc. Roy. Soc., Series A, vol 79. 1907, p. 182. 



512 


THEORY OF STATISTICS 


Illustrative Applications of Economic Interest. 

(200) Hooker, H. II., “The Correlation of the Weather and the Crops," Join. Hoy. Slot 
Soc., vol. 05, 1OOT, p. 1. 

(300) Snow, E. C., “The Application of the Method of Multiple Correlation to 1h»* 

Estimation of Post-censal Populations," Jour. Hoy. Slat. Soc., vol. 74, 1011, 
p. 575. 

(301) Yul,e, G. 11., “An Investigation into the Causes of Changes in Pauperism m 

England, etc.,” Jour. Hoy. Slat. Soc., \ol. 0*2, 1800, p. 240. 


CHAPTER 15. Correlation : Illustrations and Practical Methods. 

(302) Anderson, (Kkvii, Die Koi rclationsrechnung in der honfunfxtuifoisi hung (Frank¬ 

furter Gesellsihaft tur Konpinkturforsclmng); Knit Schioeder, Bonn, 1020. 

(303) Anderson, O., “Nuchmak uber ‘The Elimination of Spinlous Coirelation due to 

Position in Time or Space,"’ Riomctrika , v<>1. 10, 101 f, pp 200 270 (Detailed 
theory ot the method discussed by “Student" in (327).) 

(304) Anj>I‘Rson, ()., “I’eber cm limes Veifahren bei Vnwendimg der 'Variate differ¬ 

ence ’ Methode." Riomeii ihu, vol. 15, 1023, p. 13 i. 

(305) Anderson, O,, “IVber die Amvendung der Hillcreuzenmcthode (\ariate difler- 

enee Method) bei Reihenausgleichungen, Stabilitatsuntersuehungen, und 
Korrelationsmessungen," Riometi ika, vol. 18, 1928, p. 203. 

(308) Andi rson, ()., “On the Logie of the Decomposition of Statistical Series into 
Separate Components," Jour. Hoy. Slot. Soc,, vol. 00, 1927, p. 518. 

(307) Cavi.-1Jroy\ m -C w l , F. 10., “On t he Inlluem c of the Turn* Facto* <m t he ('onela¬ 

tion between tin* Barometric Ibights at Stations more than 1000 mile's apart," 
Proc. Hoy Soc.. vol. 71, 1001, pp 103 D3. 

(308) Cave, Pea i rice M„ and Kvrl Pearson, ‘'Numerical Illustrations oft he Yariatc- 

differenee Corre lation Method," Riotm tnka, vol. 10, 1014, pp. 310 355. 

(309) Darmoin, G., ‘ Analyse (4 eomparaison des series statistlques <{Ui se developpmenl 

dans le temps," Metrou. \o1. 8, Nos. 1 2, 1020. p. 211. 

(310) Frisch, Baonar, “A Method ot Decomposing an Empirical Series into its Cyclical 

and Progressive Components," Jour. Amcr. Slot. Assoc., vol. 20, 1037, Supple¬ 
ment, p. 73. 

(311) Gumyiee, K. .1., “Spurious Correlation and its Significance in Physiology," Jour. 

Awer. Stut. .Assoc., \ol. 21, 1028, p. 170. 

(312) Harris,.!. Arthur, “The Correlation between a Component. and between the* 

Sum ot Two or More Components, and the* Sum of the' Kemaimng Components 
of a Variable," Quart. Pub. Amcr. Staf. Assoc.. \ol. 15, 1017, p. 854. 

(313) Heron, D., On the Relation of Fertility m Man to Social Status, “Drapers’ Co. 

Research Memoirs: Studies in National Deterioration," I; Dulau A Co., 
London, 1000. 

(314) Hooker, H. 11., “On the Ceurelution of the Marriage-rate with Trade," Jour. 

Hoy. Slat. Soc., vol. 01, 1001, p. 485. 

(315) Hookkr, R. 11., “On the Correlation of Successive Observations, illustrated by 

Com Prices," ibid., vol. 88, 1005, p. 808. 

(318) Hooker, R. 11., “The Correlation of the Weather and the (Tops," /7m/., vol. 70, 
1007, p. 1. 

(317) Hoiei.ijno, II., “An Application of Analysis Situs to Statistics," Hull. Amcr. 

Math. Soc., July August 1027, p. 187. 

(318) Jacob, S. M., “On the Correlations of Areas of Matured Crops and the Rainfall," 

Mem. Asiatic Soc. Ringal, vol. 2, 1010, p. 84-7. 

(319) Jordan, Charles, “Sur la determination do la tendance seeulaire des grandeurs 

statist lques par la methode des moindres earres," Jour, ife la Sociele Jfongroise 
de Statistiyue, vol. 7, 1920, p. 587. 

(320) Macaulay, K. G., “Smoothing of Time Series," New York, National Rureau of 

Economic Research, 1031. 

(3*21) March, L., “Comparaison numerique de eouihes slatistiqucs," Jour, de la Societc 
dc Statistique de Parts, 1005, pp. 255 and 308. 

(322) Norton, J. P., Statistical Studies m the \nv York Money Market; Macmillan 
Co., New York, 1902. (Applications to lirmncial statistics: an instantaneous 
average method, analogous to that of Example 15.5, is employed, but the 
instantaneous average is obtained by an interpolated logarithmic curve.) 



REFEKENCES 


513 


(328) Pearson, Karl, Alice Lee, and L. Bramley Moore, “Genetic (reproductive) 
Selection: Inheritance of Fertility in Man and of Fecundity in Thoroughbred 
Racehorses,” P//i7. Trans. Hoy. Soc., Scries A, vol. 192, 1899, p. 257. 

(324) Pearson, Ivarl, and K. M. Eldkrton, “On the Variate-difference Method,” 
Biomclnku, vol. II, 1923, p. 281. 

(825) Si cos, Alexander, “Practical Application of Jordan’s Method for Trend Measure¬ 
ment ;” Victor Ilomyanszky t o.. Ltd., Budapest, 1080. 

(320) Smith, 11. H., “Combining the Advantages of First-difference and Deviation-from- 
Trciut Methods of Con eluting Time Serb s,” Jour. .hrur. Stat. Assoc., vol. 21, 

1020, p. 55. 

(327) “Student,” “The FJimimition of Spurious Correlation due to Position in Time or 

Space,” Bronictnka, vol. 10, 1014, pp. 170 180. (The extension of the difference 
method by the use of successive differences.) 

(328) Wickskll, S. 1)., “An Exact Formula for Spurious Correlation,” Mehun , vol. 1, 

No. 4, 1921, p. 33. 

(329) Will, Harry S., “On Fitting Curves to Observational Scries by the Method of 

Differences,” Ann. Math. Slats., vol. 1, 1030, p. 150. 

(830) Workink , II., and 11. JIo'jelum;, “Applications of tlie Theory of Error to the 

Interpretation of Trends,” Jour, .inter. Stat. Assoc., vol. 24, 1020, Supplement, 
p. 73. 

(831) Yn,E, G. t\, “On the Time-correlation Problem,” Jour. Hoy. Stat. Soc., vol. 84, 

1021, p. 107. 

(832) Yri.i , C. I 7 ., “Whv do we sometimes get Nonsense Corielations between Time 

Strasr* A Studv in Sampling and the Nature <>i Tune Series,'’ Jour. Hoy. Stat. 
Sot., aoI 80, 1020, p. 1. 

(333) Y’i r i i . G. I ,, “On the Corieiation of Tola I Pauperism witli Pioportion of Out- 
rchel,” Eionoruu Jour., sol. 5, 1895, p. 808, and vol, 0. 1890, p. b 18. 

(331) Yri.i , G. Ik, “ \n Investigation into the (allies of Changes in Pauperism m 
England ehietlv during the last two Inten ensal Decades,” Jour. Hoy. Slut. Soc., 
vol. 02, 1S00, p. 210. 

(335) Yt li , i». t’., * On the < hang**s in the Marriage- and Birth-rates m England and 
Males (hiring the past Half-century, with an Inquiry as to their probable 
Causes,” Jour. Huy. Slat. Soc , vol. 00, 1000, p. 88. 

CHAPTER 10. Miscellaneous Theorems Involving the Use 
of the Correlation Coefficient. 

Effect of Errors of Observation on the Correlation Coefficient. 

(830) Brown, \\ .. “Some Experimental Hesults m Correlation,” Proceedings of the Sixth 
Inter national ('onjnss of Psychology. Henna, \ugust 1909. 

(837) Hart, Bi unhid, and (. Splvrm\ s, “General Abililv , its Existence and Nature,” 

Hnt. Jinn. Psych., vol. 5. 1012, p. 51. (For conhoversy about tliesi* formula*, 
cj. ref. (It), Brown and Thomson, and references thcie given, critical notice in 
Bril. Jour. Psych., vol. 12. 1021, p. 100, and also (812) below.) 

(838) Jacob, S. M„ “On the Correlations of Areas of Maimed Crops and the Rainfall,” 

Matt. Asiatic Soc. Banjul, vol. 2, 1010, p. 8 17. ($ 7 contains remarks on the 

effects of errois on the eon elutions and regressions, with especial reference to 
this problem.) 

(330) Si’EARM vn, C., “The Proof mid Measurement of Association between Two Things,” 

Airier. Jour. Psych., vol. 15, 100L p. 88. 

(340) Spearman, C., “ Demonstration of Foinmlie for True Measurement of Correlation,” 

/fw/r». Jour. Psydi., vol. 18,1007, p 101. 

(341) Spearman, ( ., “Corieiation Calculated from Faulty Data,” Brit. Jour. Psych., 

vol. 3, 1010, p. 271. 

(342) Stead, 11. G., “The Coriection of Correlation Coefficients,” Jour. Hoy. Stat. Sac., 

vol. 8(3, 1923, p. 412. 

Correlations between Indices, etc. 

(343) Brown, J. W., M. Greenwood, and Franc es Wood, “ \ Study of Index-correla¬ 

tions,” Jour. Hoy. Stat. Soc., vol. 77, 1914, pp. 817- Mi. 

(344) Galton, Francis, “Note to the Memoir by Prof. Karl Pearson on Spurious 

Correlation,” Proc. Hoy. Soc., vol. 00, 1897, p. 498. (See (345) overleaf.) 

83 



514 THKOKY OF STATISTICS. 

(345) Pkarson, Karl, “On a Form of Spurious Correlation which may arise when 
Indices are used m the Measurement of Organs,” Proc . Wo#/. Soc., vol. 60, 1897, 
p. 489. (§§ 8, ft.) 

(340) Yule, G. U., “On the Interpretation of Correlations between Indices or Ratios,” 
Jour. Iioy . St at. Sac., vol 73, 1910, p. 044. 

The Weighted Mean. 

(847) Peak SON, Kakl, “Note on Reproductive Selection,” Proc. Hoy. Soc ., vol. 59, 
1890, p. 801. 

Standardisation or Correction of Death-rates, etc. 

For the methods of standardisation in present use in England and 
Wales, sec Seventy-fourth Annual lie pent of the Registrar-General oj England 
and Wales , 1911 , I'd. 0578, 11)18. 

Papers (819) and (851) suggested methods of standardising the birth-rate. 

(348) IIeron, I) win, “Th< Influence of Detective Physique and Pnfavourahlc Home 

Knvironiuent on the Intelligence ot School-children,” Eugtnits Laboratory 
Mcmou't, 8; Dulnu A Co., London. 1910. 

(349) Newsholvik, A., and T. IL C. Sn.vj \sos, “The Decline' of Human Fertility m 

the l mted Kingdom and other ( outlines, as shown by Corrected l-tirth-mlcs ” 
Jour. Hoy. Slat Soc.. vol. 09. 1900, p 01. 

(350) Won i mh v, IJ. II “On 1 lit* Methods of Comparing the Mortalities ol Two nr 

Mtore Commuuil ics, and llie Standardisation of Death-rates," Join. Hoy. Slat. 
Soc.. vol. 88, 19-3, p 899. 

(351) Yule, G. I .. * On the ( hnngcs m the Marriage- and ISiilh-iates m England and 

Males dining the past Hall-eenturv, et<.,” Jour. Hoy. Slat. Sot., vol. 09, 1900, 
p. 88. 

(852) Yule, G. t., “On Some Points Relating to Vital Statistics, mote espeeialh 
Statistics of Occupational Mentality,” Join. Hoy. Slat. Soc., vol. 97, 1934, p. 1. 
(Contains a full discussion of methods ot standardisation.) 

Theory of Correlation in the case of Non-linear Regression. 

See refs. (278*) (277) and tlit following: 

(353) Bl\ki jvi w, J., “On Tests for Linearity ot Regression in Frequency-distributions/ 

Iliorndttka, vol. 4, 1905, p 382. 

(354) Pkarson, K\hl, On the General Theory oj Skr<e Correlation and A on-linear He 

gresswn. “Diaptrs* Co. Research Memoirs: Riomelrie Senes,” 11; Dulau £ 
Co,, London, 1905. (The* “eonelation ratio.”) 

(355) Pkarson, K\rl, 1 On a Correction to be made to the Correlation Ratio/ 

/iromtirtkn. vol. 8, 1911, p. 251, and vol. 14, 1923, p. \ 12. 

Abbreviated Methods of Calculation. 

(356) Harris, J. Aminat, “A Short Method of Calculating the Coeflicienl of Correlalioi 

m the ease of Integral Variates,” llunmtnka, vol. 7, 1909, p. 214. (Not ai 
approximation, but a true short method.) 

(357) Harris, ,1. Ain iiur, “On 1 he Calculation of Intra-class and Inter-eloss ( oeflieient 

of Correlation from Class-moments when the Number of possible Combination 
is large,” Utomdrika , vol. ft, 191 4*, pp. 14*6—4-72. 

CHAPTER 17. Simple Curve Fitting. 

See refs. (819), (829) of Chapter 15, and the following 

(358) Aitkkn, A. C\, “On Ihe Graduation of Data by the Orthogonal Polynomials ( 

Least Square-,,” Proc. Hoy. Sot. Kdin vol. 53, 1933, p. 51. 

(359) Aitkkn, A. C„ “On Fitting Polynomials to Weighted Data by Leasl Squares, 

Proc. Hoy. Soc. Edit!., vol. 51, 1933, p. 1; and “On Fitting Polynomials 1 
Data with Weighted and Correlated Errors,” Proc. Hoy. Soc. Edin.. vol. 5 
1933, p. 12. 



REFERENCES 


515 


(360) Aitkkn, A. C\, “On the Orthogonal Polynomials in Frequencies of Type 11,” 

Ptoc. Hoif. Nor. Edit! vol. a 2, 1032, p. 17 1. 

(361) Aitkkn, A. II.. and A. Orn.Nnr.iM, “On Cliarlioi's New Form of the Frequency 

Function,” Proc. Soc. Edin., vol. .51, 1931, p. 35. 

(362) At.t.an, F. K., “The Ocneral Form of the Orthogonal Polynomials for Simple 

Series, with Proofs of their Simple Properties,” Proc. Hoy. Soc. Ed in., vol. 50, 
1930, p. 310. 

(363) 111 roic, K. T., and J. 1). Sin;\, “A Rapid Method of Calculating the Least Squares 

Solution of a Polvnomial of Any Degree,” Vniv. of California Pub. in ft laths., 
vol. 2, 1927, p. 67. 

(361) Cjiotimsky, V., 77/e Smoothing of Statist" al Senes by Least Squares (Tschrhycheff's 
Method). (In Russian.) Soviet Press, Moscow and Leningrad, 1925. 

(365) Con lion, K., “The Rapid Fitting of a Certain Class of Km pineal Formula' by 

the Method of Least Squares,” Cmv. of California Pub. in ftlaths., vol. 2, 1927, 
p. 55. 

(366) Davis, H. T., “Polvnomial Approximation by the Method of Least Squares,” 

Ann. Math. Stats., vol. 1, 1933, p. 155. 

(307) Dams, H. T., and V. V 7 . Latshvw, “Forniul.e foi the Fitting of Polynomials to 

Data by the Method ot Least Squares,'* Ann. Math. (2nd Senes), vol. 31, 1930, 
No. 1, p. 52. 

(308) Fisinit, R. A., “Studies m imp Variation: 1, dour. Agricultural Science , 

vol. 11. 1921, p. 107. 

(309) (iiM. ('., ‘ Suir mierpol.izionc di una letta quandn i vnlon della vaiiabde inde- 

peudenle sono allctli da enori accidcntali. Mthon. vol. 1, 1922, part 3, p. 53. 

(370) CiiM, ( , v Consider,!/,mm sulP mlerpol.i/aone e la perequa/.ione delle scrie 

statistjelte,” Metion, vol. 1 1922, pait 3. p. 3. 

(371) (in vvi, ,1. P , “Om raekkendviklingei hestemte v r ( d Jljaelp af de imndstc Kvad- 

lalers Mithode." 1879, Copenhagen. Repimted as “Cher die Entwickhmg 
realei Functmneii m Reihen imttelst der Metlmdc dei Kleinsten Quadraten,” 
Join . fur Math., vol. 91, 1K9L p. 11. 

(372) (Jai i \u vr. II. K LI., “Cuive Approximation hv Means of Functions Analogous 

to the Reunite Poly nomads," Ann. ftlath. Stats., vol. 3, 1932, p. 201. (Contains 
refeienees.) 

(373) Hi adhicks, \Y. \., “The Fse of the Relative Residual m the Appheation of the 

Method of Least Squares,” Ann. Math Stats., vol. 2. 1931, p, 1-58, 

(371) Dsi luas, L . v Note on Tehebveliel’f's Interpolation Foimula," Hiometnha, v ol. 19, 
J 927, p. 87. 

(375) .loanw, Or., StaLstiquc inathenialiqiu ; (iauthieiA dial’s, Pans, 1927. 

(376) .1 old)v\, Cn., “Approximation and Oiaduatmn acini ding to the Pimeiple of 

Least Squares by Orthogonal Polynomials” Ann Math. Slats., vol. 3, 3932, 
p. 257. 

(377) Pr.ARsON, K\hk, “On the Systematic Fitting of (uives to Observations and 

Measurements,” liionnlnha , vol. 1, 1901. p. 265, and vol. 2, 1902 p. 1. 

(378) IT arson , Karl, “On Lines and Plains of Closest Fit to Systems of Points m 

Space,” Phil. Mag., 6lh Set us, vol. 2, 1901, p. 579. 

(379) Pkarson, K \rk, "On a Ceneral Theory of t lie Method ol False Position ” Phil. 

ft lag., 6th Series, vol. L 1903. 

(380) Pkarson. K vim, “On a (icneral Method of Deteimmmg 11u v Successive Terms in 

a Skew Regression Line," Hionnh lint, vol. 13, 1921. p, 296. 

(381) Piktha, (L. “ Intel pointing Plain Olives," Mthon, vol. 3, 192 l>, p. 311. 

(382) Hickd, L. F., “Fitting Straight Lira's.“ Mthon, vol 1, 1922. part 3, p. 51. 

(383) UnoDi s, K. C., 1 On the Filling of 3 > aiahohe Curves to Statistical Data," Jour. 

Hoy. Stal . Sue., vol. 93, 1930, p. 569. 

(384) Romanov sky, Y., “Note on Orthogonahsing S<*nes ot Functions and Interpola¬ 

tion,” Hiometnha. vol. 19, 1927, p 93. 

(385) Snow , K. ( .. “On Restricted lanes and Planes ol Closed Fit to Systems of Points 

in Any Number of Dimensions," Pint ftlag., nth Senes, vol. 21, 1931, p. 367. 

(386) Tsciikbv t liKFK, 1*. L. See numerous papers m his collected works, (Lucres. 

(387) VVhittakkr and Robinson, Calculus of Observations; Blaekie A Son, London 

2nd Ed., 1932. 



516 


THEORY OF STATISTICS. 


CHAPTER 18. Preliminary Notions on Sampling. 

Theory of Probability and its Applications to Statistics. 

(388) Ki vnks, J. M., A Treatise on Probability; Macmillan, London, 1923. 

(389) Poincare, H., Calcut <Us Piobabilites ; Gauthier-Yiilnrs, Pans, 3890. 

(390) Venn, J. A., The Logic of Chance; Macmillan, London, 3rd Kd., 1888. 

(388) and (390) treat of probability from the point of view of its logical 
and philosophical foundations, and give a useful general introduction to the 
subject. See also refs. (7) and (9). 

Bias in Sampling. 

(391) Kiser, C. V., “Pitfalls m Sampling for Population Study,” Tour, winter, Stai. 

zls.soc., vol, 29, 1934, pp. 250 25(5. 

(392) Yatfs, F., “Some Examples of Biased Sampling,” Annals of Eugenics, vol. 0, 

1935, pp, 202-23 3. 

Various Sampling Methods. 

(393) Rovvlev , A. L., “Working-class Households in Beading,” Jour. Half. Stat. Soc 

vol. 70. 1913, p. 072. 

(391) Houua, A. L., “Measurement of the Precision attained in Sampling,” Hull. hit. 
Slut. Just ., vol. 22, P' 1 livre. 

(39k/) Hilton, John, “Enquiry by Sample; an Experiment and its Results,” Join. 
Hoy. Stat. Soc.. vol. 87, 1921, p. 541. 

(395) Jensen, A., “RepoiL on the Representative Method in Statistics,” Hull. Ini . Stat. 
Inst., vol. 22, P‘ x livre. 

(390) Jensen, A., “Purposive Selection,” Jour. Hoi/. Stat. Soc., vol. 93, 3928, pp. 511- 
517. 

(397) Niaman, J., “On Two DillVicnt Aspects of the Representative Method: the 

Method of St rut ilicd Sampling and the Method of Purposive Selection,” Join. 
Jioy. Stat. Soc., vol 97, 1934, pp. 558 025. 

CHAPTER 19. Sampling of Attributes - Large Samples. 

(Including references to i\v/>n imental Jesuits of diee-fhroxeing, etc.) 

(398) Dahhishiiu:, A. 1)., “Some Tables for Illustrating Statistical Correlation,” Man. 

and Proc. oj the Manchester Lit. and Phil. Soc., \ol. 51, 1907. 

(399) Dr.i’LEKSi.N, .T. A., “Fluctuations of Sampling in a Mendehan Population,” 

(tenches , vol. 3, 1918, p. 599 

(400) Kixii.woktii, F. Y., “Miscellaneous Appln ations of the Calculus of Probabilities,” 

Jour. Hoy. Stat. Soi., vols. 00, 01, 1897 98 (especially part 2, vol. 01, p. 3 19). 

(401) Kixn-won j u, F. Y., Article on the “Law of Errot” in the 'tenth Edition of the 

Eucyclojuvdia Hritannica , vol. 28, 1902, p. 280; or on “Probability,” Eleventh 
Edition, vol. 22 (especially Pait 2, pp. 390 el s /•//.). 

(402) KixiEWom u, F. Y., “Methods oJ Statistics,” Jour. Hoy. Slat. Soc., jubilee volume, 

1885, p. 181. 

(403) Greenwood, M., “On Knurs ol Random Sampling m certain Cases not suitable 

for the Application of a ‘Normal Curve of Frequency,’” Hunnctrika, vol. 0, 
1913, pp. 99-90. (If an event has succeeded /> times in u trials, what are the 
chances of 0, 1, . . . m successes m m subsequent trials? Tables for small 
Humpies.) 

(404) Lexis, W., Ztn Theoric tier Massenu schei mtngen m der nunscldichcn GeseUschaft; 

Frcibuig, 1877. 

(405) Lexis, W., Abhandtungen zur Them ic der Ha biker tings mid Moralstatistik ; Fischer, 

Jena, 1903. (Contains, with new matter, reprints of some of Professor Lexis’ 
earlier papers in a form convenient for reference.) 

(405//) Parkks, A. S., “Studies on the Sex Ratio and Related Phenomena,” Hiornetrika, 
vol. 15, 1923, p. 373. 

(406) Pearson, Karl, “Skew Variation in Homogeneous Material,” Phil. Trans. Hoy . 

Soc., Senes A, vol. 186, 1805, p. 343. (Sections 2 to 6 on the binomial 
distribution.) 



REFERENCES. 


517 


(407) Pearson, Karl, “On certain Properties of the Hypergeometrical Series, and on 

the fitting of such Series to Observation Polygons in the Theory of Chance,” 
Phil. Mag., 5th Series, vol. 47, 1809, p. 230. (An expansion of one section 
of ref. (40G), dealing with the problem of drawing samples from a bag contain¬ 
ing a limited number of white and black balls, from the standpoint of the 
frequency-distribution of the number of white or black balls in the samples.) 

(408) Pearson, Karl, “On the Difference and the Doublet Tests for Ascertaining 

whether Two Samples have been drawn from the Same Population,” Bio- 
metrika , vol. 10 1924, p. 219. 

(409) Poisson, S. 1)., “Sur la proportion des riaissanees des filles et des gardens,” 

Mcmoircs de VAcacl. des Sciences, vol. 9, 1829, p. 239. (Principally theoretical: 
the statistical illustrations very slight.) 

(410) Rhodes, K. C., “On the Problem whether Two Given Samples can lie supposed 

to have been drawn fioni the Same Population,” Biomelnka , vol. 10, 1924, 
p. 239, and Mclrnn , vol. 5, 1925, p. 3. 

(411) Venn, John, The Logic of Chance , 3rd lid.; Macmillan, London, 1888. 

(412) Vigor, 11. 1)., and G. V. Yi li , “On the Sex Ratios of Births in the Registration 

Districts of England and Wales, 1881-90,” Joui. Hoif. Stat. Sue., vol. 09, 1900, 
p. 570. 

(413) West krgaarti, II., Die (Inmdzuge der Theorie der Statistik ; Fischer, Jena, 1890, 

and 2nd Ed., enlarged, wilh 11. C. Nviwi.li:, 1928. 

(414) Yule, G. lb, “Fluctuations of Sampling in Mendelian Ratios,” Pror. Camb. Phil. 

Soc., vol. 17, 191 1-, p. 425. 

See also under Binomial, Normal Curve, Chapter 10. and the General 
References for Standard Errors below, Chapters 20 21. 

CHAPTERS 20 AND 21. Sampling of Variables—Large Samples. 

The probable errors of various special coefficients, etc., are generally 
dealt with in the memoirs concerning them, reference to which has been 
made in the lists of prewous chapters: reference has also been made 
before to most of the memoirs concerning errors of sampling in propor¬ 
tions or percentages. The following is a classification of some of the 
memoirs m the list below : 

General: (415), (421), (422), (42 4), (425), (426), (429), (431), (437), (444), (447), 
(452), (453), U 55), (159), (4.60), (408). 

Averages and percentiles: (416), (427), (428), (430), (436), (442), (U5), (440), 
(475), (182), (483). 

Standard deviation: (423), (428), (432), (449), (454), (470), (-4 75). 

Coefficient of correlation (product-sum and partial correlations): (417), (428), 
(434), (435), (441), (457). (470), (178), (479), (490), (491). 

Coefficient of correlation, other methods, etc.: (418), (443), (400), (105), (4«57), 
(481), (487). 

Coefficients of association: (491). 

Coefficient of contingency: (419), (448), (400), (489). 

Moments: (437), (480), (481), (485), (480), (188). 

Coefficient of variation: (451). 

(415) B\ki ii, George A., “Random Samples from Non-liomogeneous Populations,” 

Metron , vol. 8, No. 3, 1930, p. 07. 

(416) Baker, Gmmm: A., “Distribution of the Means of Samples of n drawn at random 

from a Population represented by a Gram-Charlier Series,” Ann. Math. Stats., 
vol. 1, 1930, p. 199, and note by (’. C. Craig, ibid., vol. 2, 1931, p. 99. 

(417) Bispham, J. W., “An Experimental Determination of the Distribution of the 

Partial Correlation Coefficient in Samples of Thirty,” Proc. Hoy. Soc., A, vol. 97, 
1920, and Metron, vol. 2, 1923, p. 684. 

(418) Blakeman, J., “On Tests for Linearity of Regression in Frequency-distributions,” 

Biorneti ika, vol. 4, 1905, p. 332. 

(419) Blakeman, J., and Karl Pearson, “On the Probable Error of the Coefficient of 

Mean Square Contingency,” Biomvtrika, vol. 5, 1900, p. 191. 

(420) Bortkirwicz, L. von, “The Relation between Stability and Homogeneity,” 

Ann. Math. Slats., vol. 2, 1931, p. 1. 



518 THEORY OF STATISTICS. 

(421) Bowi.ey, A. L., The Measurement of Groups and Stories; C. & E. Layton, London, 

1903, 

(422) Carver, H. C„ “Fundamentals of the Theory of Sampling,” Ann. Math. Stats., 

vol. 1, 1930, pp. 101 mid 20,1. 

(423) Carver, II. C., “The Iiitcrdependenee of Sampling and Frequency-distribution 

Theory,” Ann. Math. Slats . vol. 2, 1931, p. 82. 

(424) Craig, C. C., “An Application of Thiele's Seminvariants to the Sampling 

Problem,” Metmn , \ol. 7, 1928, p. 3. 

(425) Craig, C. ('., “Sampling in tin ease of Correlated Observation*;,” Ann. Math. 

Slats , vol. 2, 1931, p. 321. 

(426) Craig, C. C., “Note on the Distribution of Samples of N drawn from a Type A 

Population,” Ann. Math. Stats., vol. 2, 1931, p. 99. 

(427) Dodd, E. L., “The Probability of the Arithmetic Mean compared with that of 

certain other Functions of the Measurements,” Ann. Maths., \ot. 11, 1912-13. 

(428) Dunlap, II. F., “ \n Empirical Determination of the Distribution of Means, 

Standard Deviations and Correlation. Coefficients drawn from Rectangular 
Populations,” Ann. Math. Stats , \ol 2, 1931, p. 66. 

(428a) Edgeworth, F. \ “Observations and Statistics- An Essay on the Theory of 
Errors of Observation and the First Principles of Statistics,” ('amhudoe Pint. 
Trans., vol. 1 1. 1885, p. 139. 

(429) Kdgkwokih, F. Y., “Problem*- in Probabditics,” Phil. A/ag., 5tli Series, vol. 22, 

1886, p. 371. 

(430) Edgeworth. F. V., “The Choice of Means,” Phil. Mail.. 5th Series, vol. 24, 

1887, p. 208. 

(431) Enoi.wourn, F. \“On the Probable Errors of Frequency Constants,” Join. 

Pop. Slat. Soc.. vol 71, 1908, pp. 381, 199, 651 ; and Addendum, sol. 72, 1909, 
p. 81. 

(432) Feld w an - , II, M.. “The Distribution of the Precision Constant and its Square 

in Samples fiom a Normal Po|mlation,” Ann. Math. Stats., vol. 3, 1932, p. 20. 

(433) Film ra, E. (., “The Distiilmhoii of the Index in a Normal Ibvariatc Population,” 

Biometnka, \ol. 21 1932, p. t28. 

(434) Fisiif.u. It. A., “Tin* Frequency Distribution of the Values of I he Correlation 

Cocihcient m Samples from an liutefimtely Large Population,” Biometnka, 
vol. 10, 1915, p. 507. 

(435) Fisher, R. A., “The Distnbulion of the Partial Correlation Coefficient,” Metron , 

vol. 3, 1921, p 329. 

(436) Fisher, R. A., “A Malhematieal Examination of the Methods of Determining 

the Accuracy of an Observation by the Mean Error and the Moan Square 
Error,” Monthly Nut lie s. Pupal Ash. Soi ., \ol. 80, 1920, p. 75. 

(437) Fisher, R. A,, 4 Moments and Product-moments of Sampling Distiibutioas,” 

Proc. hand. Math. Soc., Series 2. sol. 30, 1928, p. 199. 

(438) Fisur.it, It. A., “The Moments of the Distribution for Normal Samples of Measures 

of Departure Irom Normality,” Proc. Pop. Sue.. \, vol. 130, 1930, p. 16. 

(439) Gibson, VViniiih.d, “'fables for Facilitating the Computation ol Probable 

Errors,” Biomeh lira, vol. I, 1906, p. 385. 

(440) Heron, D., “ Vn Abac to dclcnnine the Probable Errors of Correlation Coeffi¬ 

cients,” Biometnka , vol. 7, 1910, p. 411, (A diagram giving the probable 
error for any number of observations up to 1000.) 

(441) Heron, 1)., “On the Probable Erior of a Partial Correlation Coefficient,” 

Biometnka, vol. 7, 1910, p. 111. 

(442) IIo, to, T., “Distribution of the Median, Quartiles and Interquartile Distance in 

Samples from a Normal Population,” Biometiika, vol. 23, 1931, p. 315. 

(443) IIolzinoi.r, K. S., and V. E. 11. Church, “On the Means of Samples from a 

U-shaped Population,” Biometnka , vol. 20A, 1929. p. 361. 

(444) Hotelling, Harold, “The Distribution of Coirclalion Patios Calculated from 

Random Data,” Proc. Nat. Acad. Sei., vol. 11, 1925, p. 657. 

(445) Hoii* being, If., “The Consistency and Ultimate Distribution of Optimum 

Statistics,” Trans. Arncr. Math. Soc., vol. 32, 1930, p. 81-7. 

(445a) Irwin, J. O., “On the Frequency-distribution of the Means of Samples from a 
Population having Any Law of Frequency with Finite Moments, ete.,” 
Biometnka, vol. 19, 1927, p. 225, and vol. 22, 1929, p. 431. 

(446) Irwin, J. ()., “On the Frequency-distribution of the Means of Samples from 

Populations of certain of Pearson's Types,” Metron , vol. 7, No. 4, 1930, 
P* 51. 



REFERENCES. 


519 


(447) Tsserris, L., “On the Conditions under which the ‘ Probable Errors* of Frequency- 

distributions have a real Significance,” Proc. Roy. Sac ., Series A, vol. 92, 1915, 
p. 23. 

(448) Kondo, T„ “On the Standard Error of the Mean Square Contingency,” Bio - 

metnka , vol. 21, 1929, p. 37(5. 

(449) Koxno, T., “A Theory of the Sampling Distribution of Standard Deviations,” 

Biomctrika , vol. 22, 1930, p. 30. 

(450) Laplace, Pi fure Simon, Marquis do, The one des probability , 2 e edn., 1814. 

(With four supplements.) 

(451) McKay, A. T., “The Distribution of the Estimated Coefficient of Variation,” 

Jour. liny. Slai. Soc., vol. 9 4, 1931, p. 504. 

(452) Meidelr, il. Btroer, “Sur la probability des erreuis,” Complex rendrn , vol. 

170, 1923, p. 280. 

(453) l*i.Aim, Raymond, “The Calculation of Probable Errors of Certain Constants 

of the Normal Curve,” Biomctrika , vol. 5, 1909, p. 190. 

(454) Pi-aim,, Raymond, “On eerlain Points concerning the Probable Error of the 

Standard Deviation,’" Bionatrika , vol. 0, 1908, p. 112. (On the amount of 
divergence, in certain eases, fioiu the standard error n/V'Zn in the ease of a 
normal distribution.) 

(455) Pi-’ arson, Ecjon S., “A Further Development of Tests for Normality,” Bio¬ 

metnka , vol. 22, 1930, p. 239. 

(450) 1*1-'arson, E. S., “The Probable Error of a Class-index Correlation,” Biomctrika , 
vol. 1 4, 1923, p. 201. 

(457) Pfarson, E. S., “Note on the Approximations to the Probable Error of a 

Coelheient of Corielation,” Biomt tnka. vol. 10, 1924, p. 190. 

(458) Pearson, E. S., “The Pci cent age Limits for the Distribution of Range in Samples 

from a Normal Population,” Jiiomcfi ika. vol. 21, 1932, p. 40 4. 

(459) Pe\rson, KarIj, and L. N. (i. Fjron, “On the Probable Errors of Frequency 

Constants, anil on I he Inlliieneo of Random Selection on Yaiiation and Correla¬ 
tion,” Phil. 'Trims. Roy. Soc.. Senes A, vol. 191, 1898, p. 229. 

(4G0) Pevrson, K via. (editorial), “(hi the Pmbable Emus of Frequency Constants, 
Part 1,” Btomelriku , vol. 2, 1903, p. 273, “Part 2,“ ibid . vol. 9, 1913, p. 1, 
and “Pait 3," dud., vol. 13, 1920, p. 113. (Useful for the general formula* 
given, based on the general ease without respect to the form of the frcquency- 
disti ibut ion.) 

(461) Pearson, Karr, “On the Criterion that a (Ji\en Svstem of Deviations from the 
Probable in the ease of a Correlated System ol Variables is such that it can be 
Reasonably Supposed Lo have Arisen fiom Random Sampling,” Phil . Mag., 
vol. 50, Series 5. 1900, p. 157. 

(402) Pfarson, Karl, “On the Curves which are most suitable for describing the 
Frequency of Random Samples ol a Population," Biomctrika, vol. 5, 1906, 
p. 172. 

(463) Pevrson, Kart,, “Note on tin* Significant or Non-siguifieanl Character of a 

Sub-sample drawn from a Sample,” Biomcfiika. vol. 5, 1906, p. 181. 

(464) Pevrson, Kart-, “On the Probability that two Independent Distributions of 

Frequency are really Samples fiom the same Population,” Biomctrika , vol. 8, 
1911, p. 250, and vol. 10, 1914, p. 85. 

(465) Pearson, Ivare, “On the Probable Error of a Coefficient of Correlation as found 

from a Fourfold Table,” Biomctrika. vol. 9, 1913, p. 22. 

(46(5) Pfarson, Karr, “On tfie Piobable Error of a Coefficient of Mean Square Con¬ 
tingency,” Biomctrika , vol. 10, 1915, p. 590. 

(467) Pe arson, Karl, “On the Piobable Furor of Biserial //,” liiomctiika , vol. 11, 

1915 17, p.292. 

(468) Pearson, Karr, and Premia Stofsstufr, “Tables of the Probability Integrals 

of Symmetrical Frequency-curves in the case of Eow Powers, such as arise in 
the Theory of Small Samples,” Biometnka , vol. 22, 1931. p. 253. 

(409) Pearson, Ivare, “On the Natuie of the Relationship between Two of ‘Student's* 
Variates (^ and z.) when Samples are taken from a Bivariate Normal Popula¬ 
tion,” Biometnka. vol. 22, 1931, p. 405. 

(470) Pearson, Kvre, “Historical Note on the Distribution of Standard Deviations 

of Samples of Any Si/c from an Indefinitely Large Normal Parent Population,” 
Bwimtnka, vol. 23, 1931, p. 41(5. 

(471) Pepper, Joseph, “Studies in the Theory of Sampling,” Biomctrika , vol. 21, 

1929, p. 231. 



520 THEORY OF STATISTICS. 

(172) Pepper, Joseph, “The Sampling Distribution of the Third Moment Coefficient: 

An Experiment,” Biometnka, vol. 2 k 1982, p. 55. 

(478) Ruind, A.. “Tables for Facilitating tlie Computation of Probable Errors of the 
Chief Constants of Skew Frequeney-dislributions,” Bwmetnka, vol. 7, 1000-10, 
pp. 127 and .‘{SO. 

(474) Rhodes, K. (\, “The Comparison of Two Sets of Observations,” Jour . Hoy. Stal. 

Soc., vol. 80, 1920, p. 54*. 

(475) Rhodes, K. C., “The Precision of Means and Standard Deviations when the 

Individual Emus are Correlated,” Jour. Boy. Stal. Soc.. vol. 00, 1027, p. 105. 
(470) St Gfokges< u, N., “ Further Contributions to the Sampling Problem,” Biomctrika , 
vol. 21, 15)82, p. o:>. 

(477) Sheppard, \\ . F., “On the Application of the Theory of Error to ('uses of Normal 

Distiibution and Normal Coirelation,” Phil. Turns. Boy. Soc.. Series A, vol. 102, 
185)8, p. 101. 

(478) Soper, H. E.. “On the Probable Error of the Correlation CoeflKient to a Second 

Approximation,” lhomctnha. vol. 0, 15)10, p. 01. 

(470) Sorr.it, II. E., “On the Probable Error of the Ri-senal Expression for the Conela- 
tion Coefficient,” Binrnchika, vol. ft), 15)1 1, p. 881. 

(480) Sorr.it, II. E., “Sampling Momnils of Moments ul Samples of n Pints each 

drawn from an Onchunging Sampled Population, fiom the Point of View of 
Senn-in\ariants.” Jour. Boy. Slat. Nor., vol. 98 15)80, p. 104. 

(481) “Sii dent,” "An Experimental Determination of the Piobnble terror of Dr. 

Spearman's ( ot relation Coefficients,” Bionu tnha, vol. 18, 1021, |> 200. 

(182) “Student,” "On the Distribution of Means of Samples vvhieh are not drawn at 
Random,” Ruuueh ilu, vol. 7, 1000, p. 210. 

(488) Tom lm-Hi i r. 1*. k. df, “lies valeuis ni-ni'niKs,'’ Jour, dc Maths. (2), vol. 12, 
1807, pp. 177 18 k 

(484) Tschuprow, A. A., “On ttie Mathematieal Expectation of the Moments of 

Frequency-distributions,” Bwnicfnha , vol. 12, 1918 10, pp. 110 and 185, ami 
vol 18, 1921. p. 288; and Mdron. vol. 2, 15)28, pp. 401 and 010. 

(485) YVinuari. “The Derivation ot eeitain Ifigh-order Sampling Product-moments 

from a Normal Population,” Biornetnha. vol. 22 1980, p. 221. 

(480) Wish urr, ,k, “Notes on Frequency Constants," Join. Just, of Actuaries , vol. 02, 
15)81, p. 17 k 

(487) Wishahi, ,k. “'flic Mean and Second-moment Coefficient of the .Multifile Correla¬ 

tion Coefficient m Samples from a Normal Population,” Biomctrika , vol. 22, 
15)81, p. 858, (V\ 1 11 1 an editoual appendix of tables iff the mean value and 

squalid standard deviation oi a multiple eoirelatmn eoclfieient.) 

(488) \Vishah’i,J , and M. S. Rakiii rr.“Tfic Distiibution of Seiond-order Moment 

Coefficients in Small Samples,” Bute. ('amh. Phil. Soc.. vol. 28, 1982, p. 155. 

On the problem of duel nations of sampling in correlations between time- 
series, see also Yuli, (882). 

(489) Young, Andrew', and Rmu. Pfvhson, “On the Probable Error of a Coefficient of 

Contingency without Approximation,” Biomctrika , vol. 11 , 1910 17, p. 215. 

(490) Yule, G. I ., “On the Theory of Correlation tor Any Numbei of A aiiables treated 

by a New System of Notation,” Pioi. Boy. Soc., Series A, vol. 79, 1907, p. 182. 
(See pp. 192 198 at end.) 

(491) Yuli , G. Ik, “On the Met hods of Measuring Association between Two Attributes,” 

Join. Boy, Stal. Soc., vol. 75,1!>12. (Probable error of the correlation eoefiieient 
for a fourtold table, of association eoeffir icrits, etc.) 

Reference may also be made to the following, wind) deal for the most part 
will) the effects of errors other than errors of sampling: 

(492) Row l.i* y, A. k., “Relations between the A ecu racy of an Average and that of its 

Constituent Parts,” Jour. Boy. Siat. Soc.. vol. 00, 1897, p. 855. 

(498) Rowley, A. k., “The Measurement of the Accuracy of an Average,” Jour. Roi/. 
Stat. Soc., vol. 75, 1911, p. 77. 

CHAPTER 22. The x 2 Distribution. 

(494) Rowley, A. k . and R. k. Connor, “Tests of Correspondence between Statistical 

Grouping and Formula ,” Kcouomica. 15)28, p. 1. 

(495) Fisher, R. A., “On the Intcrpielalion ot y' lrom Contingency Tables, and the 

Calculation of P," Jour. Boy. Stat . Soc., vol. 85, 1922, p. 87. 




REFERENCES. 521 

( 496 ) Fisher, R. A., “On the Mathematical Foundations of Theoretical Statistics,” 

Phil. Traits., Series A, vol. 222, 1922, pp. 809868. 

(497) Fisher, R. A., “The Conditions under which y 1 measures the Discrepancy between 

Observation and Hypothesis,” Jour. Hoy. Stat. Soc vol. 87, 1924, p. 412. 

(498) Fisher, R. A., “Statistical Tests of Agreement between Observation and Hypo¬ 

thesis” (with a note in reply by A. L. Rowley), Economica , 1928, p. 139. 

(499) Irwin, J. O., “Note on the y* T^st foi Goodness of Fit,” Jour. Roy. Stat. Nor., 

vol. 92, 1929, ]>. 204. 

(5(H)) Neyman, J., and 1C. S. Pearson, “On the Use and Interpretation of Certain Test 
Criteria for Purposes of Statistical Inference,” Biomctnka , vol. 20A, 1928, 
pp. 175 and 203. 

(501) Ni yman, J., and K. S. Pearson, “Further Notes on 1 he y* Distribution,” 

Riomctrika, vol. 22, 1931, pp. 298-305. 

(502) Pi.\n son, I\ ajil, “On the Criterion that a Given System of Deviations from the 

Probable in the ease of a Correlated System of Variables is such that it can 
be reasonable supposed to lm\e arisen Iroin Random Sampling,” Phtl. Mag., 
vol. 50, Senes 5, 1900, pp. 157 175. 

(503) Pi \rnon, Kvre, '‘Multifile Cases of Disease in the Same House,” liiometrika, 

vol. 9, 1913, p. 28. (A modification of the goodness of lit test to cover such 
statistics as those indicated by the title ) 

(504) l*i \u son, K \ kt , “On the Vpplieation of Goodness of Fit Tables to Test Regression 

Curves and Theoretical Curves to Describe Observational or Experimental 
Data.' Bnnmluka , vol. 11, 1915, p. 239. 

(505) Pevrmin, K vri„ “On a Ruef Piool of the Fundamental Formula for Testing the 

Goodness of Fit of Frequency-distribution and on the Probable Krror ol P,” 
Phil. Mail., vol. 80D (6th Senes), 1910, p. 369. 

(506) Pi vitsoN, lvAia,, “On the y l r Iest ot Goodliest of Fit,” Riomctrika , vol. 14, 1922, 

p. 186; and “Further Note,” ihul , p. 41H. 

(507) Pj. arson, Kaul, “Note on the Relation of the (P, y') Test to the Distribution 

of Standard Deviations m Samples from a Normal Population,” Rionutriku. 
vol. 19, 1927, p. 215. 

(508) Pi. arson, Karl, ” Fxpenmental Discussion of the "test lor Goodness of Fit,” 

Riomdriku, vol. 21, 1982. pp. 351 881. 

(509) Robinson, Si buy, “An Kxpcriniont regaiding the y 1 Test,” Ann. Math. Stats., 

vol. 4, 1983, p, 285. 

(510) Sin jTARi). W F.. “The Fit of a Formula for Discrepant Observations,” Phil. 

Tunis., Senes vol. 228, 1927, p. 115. 

(511) Yi ia , (4. I D\y, “On the \ppheatinn of the yy Mi thud to Association and 

Contingency Tables, with Experimental Illustrations,” Join. Roy. Stat. Soc., 
vol. 85, 1922, p. 95. 


CHAPTER 23. Sampling of Variables—Small Samples. 

(Including sonic icjei cnees to the theoiy of ■statistical inference.) 

(512) Bvkeii, Grout.u A., “The Significance of the Product-moment Coefficient, with 

special refeienee to the Marginal Oistnbulions," Jour. Arncr . Stat. Assoc., 
vol. 25, 1930, p. 387; and the related Paper: Pi \inon, K(jo\ S., “The Test 
of the Significance foi the Con elation tooHieient,” Jour. Amer. Stat. Assoc., 
vol. 26, 1931, p. 128. 

(513) Baker, Gi.okc.e A., “The Relation between the Means and Variances, Means 

Squared and Variances m Samples from Combinations of Normal Populations,” 
Ann. Math. Stats., vol. 2, 1931, p. 383. 

(514) Bayes, T., “An F.ssny towards Solving a Problem in the Doctrine of Chances,” 

Phil. Trans., vol. 53, 1763, p. 370. 

(515) Beukson, Joseph, “Hayes’ Theorem,” Ann. Math. Stats., vol. 1 , 1930. p. 42. 

(516) Bowlky, A. L., “F. Y. Edgeworth’s Contributions to Mathematical Statistics,” 

published bv the Royal Statistical Society, 1928, 

(517) Camp, Bi uion II., “A New Generalisation of Tchcbyeheffs Statistical In¬ 

equality,” Rail. Amer. Math. Soc., vol. 2.8, 1922. 

(518) Camp, Burton II., “Problems m Sampling,” Join. Amer. Stat. Assoc., vol. 18, 

1923, p. 964. 



522 THTCORY OF STATISTICS. 

(519) Cheshire, L., E. Omms, and E. S. Pearson, “Further Experiments on the 

Sampling Distribution of the Con elation Coctlicienl,” Jour, Amer. Slat. Assoc., 
''VOl, 27, 1982, p. 121. 

(520) Church, A. K. K., '"On the Moments of the Distributions of Squared Standard 

Deviations for Samples of A drawn from an Indefinitely Large Population,” 
Biometrika , vol. 17, 1925, p. 79. 

(521) Church, A. E. R., “On the Means and Squared Stand »rd Deviations of Small 

Samples from any Population/' Bwnn h ifta, \oI. 18, 1920, p. 821. 

(522) Craig, C. C., “Sampling when the Patent Population i^ of Pearson's Type HI,” 

Biometrika , vol. 21, 1929, p. 287. 

(528) Dodd, E. I,., “The Convergence of Genera! Means and the Invariance ol Form of 
certain Frequency Functions.” Amir. Join. Math., \oI. 19. 1927. 

(524) Dodd, E. L., “The (Greatest and the Least Vaiiale under General Laws of Error,” 

Trans. Arnci. Math Soc., \ol. 25, 1928, p. 525. 

(525) Dodd, E. L., “The Convergence of a Gcneial Mean of MeuMiiements to the True 

Value,” Hull. Amer. Math. Site., \ol. 82, 192(5. 

(52(5) Ezukii if, Mokdicxt, “The Sampling Nuiiabdilv ol Linear and Curvilinear 
Eegiession.” Ann. Math Stats., vol. I, 1980, p, 27 5 

(527) Fisher, R. A., “Invcise Piobabihl \,” /hoe. ( anih Bint. Sot , vol. 2t>, 1980, 

p. 528. 

(528) Fisher, 11. A., “Inverse Probability and the l\e of Likelihood,” Proe. Comb. 

Phil. Soc., vol. 28, 1982, p. 257. 

(529) Fisher, R. A., “On the Probable Kiror of a CocHii lent of ('oirelation dedueed 

from a Small Sample,” Metron , \<d. 1 No. 1, 192t, p. 8. (See also refs. (181) 
and (435).) 

(530) Fisher, II. A., “Tlie General Sampling Distribution of the Multiple Correlation 

Coelbeient,” Pioc. Roy. Soc., A, vol. 121, 192S p. (551. 

(531) Fismat, R. A., “Moments and Pm duet-moments of Sampling Distributions,” 

Proe . Land. Math. Soc., vol. 80, 1928. p. (99. 

(532) Fisni a, 11. A., and L. 11. C. "fieri i i, “Lundmg forms ot the Frcqucncv-distii- 

bution of the Laigest oi Smallest Member <>l a Sample,” Pioc. Camlt. Pint. 
Soc., vol. 2t, 1928, p. 180. 

(533) Fisher, 11. A., “On (he Mathematied Foundations of Thcoielioal Klahshes,” 

Phil. 'Trails A, vol. 222, 1922, p. 309. 

(534) Fisher. 11. A., “The Theory of Statistical Estimation,” Pun. Camlt. Phil. Soc., 

vol. 22, 1925, p. 700. 

(535) Fisiilr, R. A., “Oil a Distribution Yielding the Eiror Functions of Several 

Well-known Statistic's, 1 Proe. J nil niationat Math. Countess at Tot onto, 1921, 
p. 805. 

(536) Fisher, R. A., “ \pphcations of "Student’s’ Distribution” (and following tables 

by “Student”), Melton , vol. 5, No ;j, 1927, p 90. 

(537) Greenwood, M., and L. Issi mas, “ \n Tlistorieal Nolo on the Problem of Small 

Samples,” Join. Roy. Siat. Soc., vol. 90, 1927, p 817. 

(538) IIai.e, Philip. “The Distribution of Means for Samples of Si/e A’ diavMi Irom a 

Population in which the Variate takes Values between 0 and 1 all such \ allies 
being Equally Probable,” Biotin inka, \ob Pi, 1927, p. 210. 

(589) Hotklltng, II., “Tlie (Generalisation ol "Student's’ Ratio,” Ann. Math. Slats,, 
vol. 2, 1931, p. 360. 

(540) Hotelling, II., and Margaret Parm', “Hank Correlation and Tests of Signifi¬ 

cance involving No Assumption ol Normulitv,” Inn. Math. Stats ,vol. 7, 1986, 
p. 29. 

(541) Irwin, 5. O., “Mathematical Theorems involved in the \nalysis of Vauanee,” 

Jour. Rolf. Stat. Soc., vol. 91, 1981, p. 28 k 

(512) Irwin, J. 6., “On the Frequency-distribution of I he Means of Samples from a 

Population having Anv Law of Frequence wilh Finite Moments, etc.,” Bio¬ 
metrika, vol. 19, 1927, p. 225, and vol. 21, 1929, p. 133. 

(513) Irwin, J. ()., “On the Frequency distribution of Any Number of Deviates from 

the Mean of a Sample from a Normal Population and Ihe Partial Correlations 
between them,” Jour. Roy. Stai. Soc., vol 92. 1929, p. 580. 

(544) Isserus, L.. “On the Value of a Mean as oalculah d from a Sample,” Jour. Roy. 
Stat, Soc,, \oh 81, 1918, p. 75. 

(51-5) Le Roux, .1, M., “A Study ol the Distiihution of Variance in Small Samples,” 
Biometrika , vol. 23, 1931*, pp. 131 190. 



REFERENCES. 



(546) Meidell, H. Birger, “Sur un prohleme (hi ealcul des probability el les 

statistiques mnthematiqiies,” Com pits rendus. \<>I. 175, 1922, p. 800. 

(547) Molina, E. C„ “Bayes’ Theorem: An Expository Presentation,” Ann. Math. 

Stats., vol. 2, 1901, p. 25. 

(548) Nevman, J., “Contributions to the Theory of Small Samples drawn from a 

Finite Population,” Htvue MensueJle de Statislupir. Office Cent ml de Stat. de 
la Hepubliquc Polonaise, vol. 0, p. 1 ; reproduced in Piometrika , vol. 17, 1925, 
p. 472. 

(549) Nhyman, J., and E. S. Pearson, “On the Use and Interpretation of Certain 

Test Criteria for Purposes of Statistical Inference,” Piometrika, vol. 20A, 1928 
and 1929, pp. 175 and 208. 

(550) Neyman, •!., and E. S. Pearson, “Oii the Problem of k Samples,” Halt, (le 

VAvnd . polonaise des Sri. ef des Lefties, Seres A, 1981, p. 460. 

(551) Nevman, J., and K. S. Pearson, “On tlw Testing of Statistical Hypotheses in 

relation to Probability a prion ',” Pi or Cumb. Phil. Sor vol. 29, 1988, p. 192. 

(552) Pearson, Egon S., and N. K. \dv\M'iia\a, “The Distribution of Fiequency 

Constants in Small Samples from Non-noimul Symmetrical and Skew Popula¬ 
tions,” Preliminary Notice, Piometnka , vol. 201928, p. 850, and Second 
Papci, “ I )isl libution of 1 Student's’ z. ' Buunt h ilea, vol. 21, 1929, p. 259. 

(553) Pearson, Egon S., “Sour- Notes on Sampling Tests with Two Variables,” 

Bmimtnka , vol. 21, 1929, p, '.I'M. 

(551) Pearson, E. S., “The Test of Significance for the Conclation Coefficient,” Jour. 
Avivi. Stat. Assoc., vol 20, 1981, p. 128. 

(555) Pearson, Egon S , and J. Ni \mw, “On the Problem of Two Samples,” Pull. 

de VAcad. polonuise de ,s Sd. el dts Li this, Scales \, 1980. p. 78. 

(556) Peviisov, E. S., “The Analvsis of Vmmuiiu m e.i^es of Non-normal Variation,” 

Piometrika , vol. 28, 1931, pp. 114 188. 

(557) Pi arson, K. S , “The Tc->t «d Siamneaucc fot the Correlation Coelhcient Some 

Further Results,” Join. Auter Staf. Is so<.. voj. 27, 1982, p. 421. 

(558) Pearson, E. S.. * Sampling Probh ms m industry/'' Jour. flop. Stat. Sor ., Suppl., 

vol. 1, 1981, p. 107 

(559) Pearson, Kvrl, ‘ On the 1 )ist nbution of the Standard Deviation m Small 

Samples,” PmmeLiUa. vol. 10, 1915, p 522. 

(500) Pearson, Kvrl, “The Fundamental Pinhlcm of Pnietual Statistics,” Piometrika, 

vol. 18, 1920, p. J. 

(501) Pearson, Karl., “Further Contiibuiums to the Theory of Small Samples,” 

Piometrika. \ol. 17, 1925, p. 170 

(502) Pearson, Karl, “Another llistoiie.il Note on the Theoiv of Small Samples,” 

Piometiiha. vol 19, 1927. p 207. 

(502a) Pi-arso\, K\ki, G. B. .Inriiuv and E. M Kuh kion, “On the Distribution 
of the First Ptoduel-moment ( octh< a nt m Small Samples diavvn from an 
Indehmtely Large Nonnal Population,' Punmtnha. vol 21, 1929, j>. IOC 
(508) Pearson. Kvrl, “Sour Properties of •student*!** z," Pmihctriha, \ ol. 28, 1931, 

!>• E t r , 

(501) Pearson, Karl, and Premia Sioessiger, ‘ Tallies ol the Probability Integrals 
of Symmetrical Frcquency-curv es in the ease of Power Powers such as arise 
in the Theory of Small Samples,” Ptomehika. vol 22, 1981, j>. 258. 

(505) Bidek, Paul 1C, “On Small Samples fiom certain Non-noimal Universes,” Ann. 
Math. Stats., vol. 2, 1981, p. 18. 

(500) Bidek, Pail B., “A Nnle on Small Sample Theory,” Join. Amu. Stat. Assoc., 
vol. 20, 1931, p. 172. 

(507) Hmrii, Pai l 1C, “On the Distribution of the Batso of Mean to Standard Devia¬ 
tion in Small Samples lioin Non-noimal Universes,” Piometrika , vol. 21, 
1929, p. 121. 

(5G7«) Bider, Pall R., “A Survey of the Theory of Small Samples,” Ann. Maths., 
Oct. 1930, p. 577. 

(568) Bider, Paul 1C, “On the Distlibution of the Correlation Coellieient in Small 

Samples,” Piometrika , vol. 21, 1932, p. 3s2. 

(569) Rieiz, II. L., “Comments on the Applications of the Recently Developed Theory 

of Small .Samples,” Jour. Avar. Stat. Assoc., vol. 20, 1981, p. 150. 

(570) Romanov sky, V., “Sulla probability a posteriori/* Giom. delV Islitato Jtahano degli 

Atluari ,” anno 2, 1931. 

(571) Romanovsky, V., “On the Criteria that Two Given Samples belong to the 

Same Normal Population,” Mctron, vol. 7, 1928, part. 8, p. 3. 



524 THEORY OF STATISTICS. 

(572) ItoMANOVSKY, V., “On the Moments of Means of Functions of One and More 
Random Variables,’'’ Met ran , vol. 8* part 1, 1029, p. 251. 

(572a) SiiLwiiAirj, W. A., and F. W. Winii. its, “Small Samples New Experimental 
Results,” Jour. Amer . Sial. Assoc,, vol. 29, 1028, pp. 141 159. 

(579) S iron at, J. (Jacques Chokhate), “Inequalities for Moments of Frequency Func¬ 
tions and for Various Statistical Constants,” Biometnka , vol. 21, 1020, p. 901. 

(574) Smith, C. 1)., * k On Generalised Tvhcbyciieff Inequalities in Mathematical 

Statistics,” Amer. Jour. Math., \ol. 52, No. 1, 1090. 

(575) Snkdkcoii, G. W., Calculation and Interpretation of Analysis of Variance and 

Covariance; Collegiate Press, Ames, Iowa, 1084. 

(570) Soper, II. 10., “The General Sampling Distribution of the Multiple Correlation 
Coenieient,” Join. Hoy. Stat. Hoc., vol. 02, 1029, p. 415. 

(577) Soper, II. E., and Others, “On the Distribution of the Correlation Coefficient in 

Small Samples,*' Hiomclrika , vol. 11, 1910 17, p. 828. 

(578) “ Sopiiisi i k,” “Discussion of Small Samples from an Infinite Skew Fimerse,” 

Biunicfriha , vol. 20\, 1928, pp. 889 128. 

(579) “Stiden i“O il the Probable Firor of a Mean,” Biometrika , vol (>, 1908, p. 1. 

(580) “Student,” “On the Probable Error of a Correlation CocOicient,” Biometrika , 

\ol. 0, 1908, p. 802. (The problem of the probable error with small samples.) 

(581) “Si ltdfnt,” “On the 7-Test”; followed b\ Emit, lh arson, “Further Remaiks 

on the z-Tcst,” Bunnelrika, vol. 28, 1981, pp. 107 115. 

(582) Tschuprow, A. A., “On the Asymptotic Frcquencv-disltilmlions of the Arith- 

metie Means of n Correlated Observations loi Way Great Values of n," Jour. 
Hoy. Stat. Sue., vol. 88, 1925, p. 91. 

(588) Wilks, S. S., “Certain Generalisations in the Analysis of Variance,” Biometrika , 
vol. 24, 1082, p. 471. 

(581) NVrsii art, John, “The Gemialised Product-moment Distribution in Samples 
from a Normal Multivariale Population,” Bionuhika , vol. 20A, 1928, p. 22. 
(585) Wish art, John, “The Coirclation between Product-moments ol Any Ordei in 
Samples from a Normal Population,” Proe. Hoy. Sor. Edtn., vol. 19, 1929, p. I. 

(580) Woo, T. L., “Tables for ascei lainuur the Significance or Non-signilieanee of 
Association Measured In the Correlation Ratio,” Biometrika, \n], 21, 1929, p. 1. 

CHAPTER SH. Interpolation and Graduation. 

(587) “Interpolation and \llied Tables,” Reprint from Aautreal Almanac for 1987; 

His Majesty's Stationery Odiee, 3 980 

(588) Pearson, K\kk, Tracts for Computers, f I and III. On the Construction of Tables 

and on Interpolation; Cambridge l mversity Press, 1920. 

(589) Si ioffexsen, J. F., Some Hecrnt Researches in the Theory of Statistics and Actuarial 

Science; Cambridge: published foi tin Institute of Actuaries bv the University 
Press, 1980. 

(590) Stkitenskn, J. F., Interpolation ; Williams & Wilkins Co., Raltimore, 1927. 

(591) Whittaker and Robinson, The Calculus oj Observations; Blaekic V Son, London; 

2nd Ed., 3982. 

The student who wishes to proceed further with the subject will probably 
find the last work cited the best for general use: it includes, of course, much 
besides mleipolatnm. Rut (590) is very valuable for the advanced worker. 
All students are leeommended to read the second lecluic m the small work 
given under (589). 

One can hardly give specitie references, but the student will find much that 
is useful in the otln ml publications ol oni own and other countries dealing with 
the construction of life-tables. 


TABLES. 

A. Tables Useful in Calculation. 

(592) Barlow’s Tables of Squares , Cubes , Squat e-roots. Cube-roots and Reciprocals of 
alt Integer A umbers up to 10,000; E. & F. N. Spoil, London and New r York; 
new edition, 1980. 

(598) CoTswoimi, M, lb. The Direr t Cakulalor , Series (). (Product table to 1000 x 
1000.) MTorquodulc & Co., London. 



REFERENCES. 


525 


(504) Crelt/f, A. L., Bechentafehi. (Multiplication t&blc giving all products up to 
1000 v 1000.) Can be obtained with explanatory introduction in German or 
in English. G. Kemier, Beilin. 

(595) Eloekton, W. P. ‘‘Tables of Powers of Natural Numbers, and of the Slims of 
Powers of the Natural Numbers fiom 1 to 100” (gives powers up to seventh), 
Biometrika , vol. 2, p. 474 reproduced in (598). 

(590) Peters, J., Neue Bcchentafctn fin Muttiplikalion mid Division, (Gives products 
up to 100 a 10,000: more convenient than Crellc for forming foui-figure pro¬ 
ducts. Introduction in English, French or German.) G. Renner, lieilin. 

(597) Zimmkiimann, If., licchculafel, nebst Sauunhing haufig gebrauohter Zahlenwertbe. 

(Products of all numbers uj) to 100 x 1000: subsidiary tables of squares, cubes, 
square-roots, cube-roots and reciprocals, etc. for all numbers up to 1000 at the 
foot of the page.) W. Ernst & Son, Berlin; English edition, Asher <fc Co., 
London. 

A number of useful tables will be found in the series “Tracts for Computers,” 
published by the Cambridge Fiuvcisity Press for the Department of Applied 
Statisties, 1’imersily College, London. A list is usually given in the advertise¬ 
ment pages of the current issue of Biometrika. 

13. Tables Useful in Statistical Work. 

The more advanced student v ill probably find it indispensable to possess - 

(598) Tables fot Statisticians and Biomrfrietans , Part I (edited by Karl Pearson), 

pnec 15^ , horn the Jliomi h iha Office, Unncrsity College, Loudon, W.C. 1. 

(599) Part 11, price 80s., obtainable froi^ the same address, contains tables of a 

mote advanced character. 

The following tables also contain much that is useful for modern statistical 
work: 

(GOO) Tables oj the (ompleh and Incomplete / 1-Sanction (edited by Karl Pearson), price 
55s. 

(001) Tahlis of the Incomplete I -Function (edited b\ Karl Pearson), price 42s. 

(002) Tables of the Complete and Incomplete Ediptu Integrals, price 12s. (id. 

The above are obtainable from the Biometrika Oilice, I’nivcrsity College, 
London, W’.C. 1. 

(603) Tiacts for C'omjmters, No, /, Tables of the Digamma and Trigamma Functions , 
price 3s. 

(601) Tiacts for Computei s\ Nos. /, S anil ( >, Logarithms oj the Complete V-Ftnietion. 

(605) Traits fm ('amputt is. No. 75, Bundotn Sampling N umbels , 1>\ L. II, C. Tipped, 

price 3s. 9d. 

(606) Bntish Association Matin matieul Tables , vol. 1, London. 1931, Oflice of the 

British Association, Burlington House. London, W. 1, price JOs., post free. 
(Circular and Hyperbolic Functions; Exponential Sine and Cosine Integrals; 
Factorial (Gamma) and Derived Filiations; integrals of Probability Integral.) 

(607) Bntish Assonalion Mathematical Tahlis , \ol. 6, London, 1936, price 10s. Bessel 

Functions, Part 1, Functions of Order 0 and 1. 

(608) Tables of the Higher Mathematical Functions (edited by If. T Da\is), Principia 

Press, Bloomington, Indiana. (London: Williams A Norgate). 

Part I, price 25s. (Historical Introduction, Tables of l- and Digamma- 
Funetions.) 

(609) Part )I, juice 25s. (Tahlis of the Trigamma, Tctragamina, Pentagauima and 

Hexagamma Functions, of Bernoulli and Euler Numbers, of certain numbers 
facilitating the fitting of a polynomial.) 

(610) Kelley, T. L., “Tables Lo facilitate the Calculation of Partial Coefficients of 

Correlation and Regression Equations,” Bulletin of the Vmversitij of Texas, 
No. 27, 1916. (Tables giving the values of 1/V (1 - / ~ 3 )(1 -r c j 3 ) arid 

n,rjV( 1 -T),j)(l - 

(611) Miner, J. R., Tables of \ 1 -i* and 1 - r 2 j or use in Partial Correlation, etc.; The 

Johns Hopkins Press, Baltimore, 1922. (Six-figure tables.) 

(612) Salvosa, L. U. f “Tables of Pearson's Type HI Function,” Ann* Math , Stats., 

vol. 1, 1930, p. 101. 



526 


THEORY OF STATISTICS 


References to Italian Literature. 

In some respects the methods developed by the artn e school of Italian 
writers have diverged a good deal from those of English and American 
writers. The following bibliography, prepared by the kindness of l)r 
Silvio Orlandi, Manager of Metro)), will serve as a guide to the student 
who wishes to broaden lus outlook by making himself acquainted with 
such methods. 

Books. 

(013) Bknint, R., Principi di statistica nn todotogica; IJnionc Tipografioa Editriee 
Turinese, Torino, 1920. 

(014) BoianuM, M., Statistica A/ipindi per gli studenti, roll. 2; Giuffre, Milano, 
1931-35. 

(015) Gini. C Appunti th statistica nutodologiccr, Libicria Caslellani, Roma, 1930-31. 

Traduzione spngnola: “Curso do Kstudist icn " (con un apondioo matomatioo 
por Ij. Galvam), Eneielopedia de Cieneias 1 uruheas y Societies, Editorial Labour 
S.A., Barcelona, 1935. 

(610) Li\J, L., Element! di statistic cr “(Vdam," Padm.j, 1921). 

(01?) Mokiaiia, G., “Le/iom di staiisln.t me todologiea,” Indite dal Gwrnalc da*li 
Economist! c liunsta dt Statistica, ( itta di t asfcllo, 1922. 

(018) Nktfoko, A., II metodo stedistncr french translation, La Mclhodc 

statisliijuc\ Maieel Liard, Tans, 1925. 

(019) Pik.tra, G., Statist tea, volJ. 1 e 2; Giuffre, Milano, 1931. 

See also 

(020) Traltato Elementale di Statistic a, dirt Ito da (k Gim, Giulfre, Milano, 1930. Vol. I, 
Slat) shea JMefodohsput ; Vol. II, Dnnogi after, Vol. Ill, Anlropomdi in c II to¬ 
rn etna; Vol. IV, Statists a Economtca, Vol. V, Statistica Ecunonucu; Vol. \ J, 
Statistic ft -s ocudc. 

General. 

(021) Gini, Ck, “The Contributions of Itais to Modern Statistical Methods,” Journal of 
the Royal Statistical Society, London, 1920. 

(622) Gim, (\, “Present Conditions and Futuie Piogiess of Statistics,” Journal of the 

American Statistical Association , 1930. 

Graphical Representation. 

(623) Gini, C., “Suit ulihta delJe rappresenta/iom gruhehe,” (liennote degli Economisti 

e Rnnsta di Statist tea, 191 L 

(624) Gini, C., “Two Remarks on Graphs,” The Indian Journal of Statistics, vol. 1, 

August 1931. 

Interpolation and Extrapolation. 

(025) Caniixu, F. Ik, SuW adciltamenlo di curve ad ana sene cli unsure o di osservazioni , 
Roma, 1905. 

(020) Gini, C., ‘’('onsidera/iom suit mlerpolazione e la pemjua/ionc delle serie 
statistiche,” Me lion, vol. L lasc. 1. 1921. 

(027) Gini, Ck, “Suit lntcrpohr/ione di una ictta qtiando J valori della variahile mdi- 
pendente sono affetti da erroii aeeidentah,” Metron, vol. 1, fase. 4, 1921. 

(028) Gnsi, (k, “ Rieerehe spcuimntah nel eampo della interpolazione di serie 
statistiche,” Atti del It. Istiiuto \cncto di Scienze , Lettue ed Arti, 1923. 

(029) Mouno, R., “1)i un metodo di infeqiol.i/ione statistica,” Metron, vol. 12, fase. 2, 
1934. 

(080) I'u.tua, (>., " Tillerpolal mg Plane turves,” Metion, vol. 3, fase. 3 4, 1921. 

(031) Pii'-'i ua, G., * DelP mterpola/ione paiahohea nel easo in em entrambi i valori delle 
variabih sono affetti da cnon aeeidentah, Metion, vol, 9, fase. 8 4, 1932. 

(032) Salvivmim, T., “Rieerehe spermientah sull" interpolazione gralica di istogrammi,” 
Metron, vol. 11, fase. 4, 1934. 



INFERENCES. 527 

(000) TiiDiiscm, 15., “Nuovo eontributo al problems della inlerpolazione lineare,” 
(Hamate dcW 1st Unto Jtahano dcgh Attuan , vol. 5, n. 2 .'5, 1004. 

(004) Vkiionkst:, (J., Contnhuto allc lucre he spenmentaH net campo deW in ter pot axiom 
statistical Padova. 


Means, etc. 

(005) Gaia am. JL., “Sulla determinazione del eenlro dj gravita e del eentro mediano 
di uiia popolazione, eon apphca/.ionc alia popolazione ilaiiaxm censita al 
1° dieeud)ie 1021/' Mitioiu \ol. 11, u. 15, 1000. 

(000) Gim , ( ., and Ij.Caia am, “Di lalune estcnsiom del concetto di media ai earatteri 
qualilulivi/' Metion. \ol. 8, n 1 -2. 

(007) Gim, C., M. T5ou>ht\] and A. \r\r,m., “Sin eentn della popolazione e sulle loro 
applicazioni,” Mctron , vol. 11, n. 2. 


Frequency and Probability. 

(<>38) Cantft.i.t, V. 1\, “Sulla le^ce dei jjrandi mimeri," Metnnrie delta It. Aecad, dei 
Lined. 1010. 

(GOO) Can n i u, C P. “Sulla probabilita coim* linnte d< 1 Lilieipien/ti," licndiconli della 
ft. . \(cad . dei In/a ri. 1017. 

( 0-1.0) (iiM, C,, “( he eoCcl.i probabilita," fin (stn dt Sr tenza, 1008 

((ill) (Jim. C , ‘ II vev.o d,d punto di v isla statist ico," Cap. 1 \ . pa<™. 70 120,125 101, 
Jshhdo d» Slahstua delta It l n no situ dt ttoiua 

(042) (Jim, ( , ^Consul' lu/ioni *■ idle probabilita a postemu i ( applieazionc al rapporto 
dei sessi nolle naseite uniune/’ Shall Ei onontuo-iLundici della It. Univcrsita 
di ('aidtan. 1011. 


Variation and Concentration 44 Transvariazione.” 

(040) Cam i i oi. \\ l\, “ Sulla diiteren/.a media eon ripetizionc/’ (Iwrnutc dc<*U Eeononiisti 
c Jtieistu di Stall shea , Kcbiuaiv 101,0. 

(014) Cas'ii.i lano, V., “Sidle i< la/iom Ira curve di frequen/a o curve di concentrazione 
e sin rappoib <b eonoi ui ra/iono tan nspondcnti a determinate disi libuzioni,” 
Mi him. \ol. (0, n. I, 1000. 

(045) Casj i i,i,\ nu, A ., ’ * S * ui 1 1 mdiei iclalivi di vanabibta e sulla eoneentni/ione dei 
eaial ten eon m nuo,' 1 /< h on. vol. 10 n. 1. 

(010) in Fim.fi i, 15., * Sin nuOodi proposli pei d ealcolo della ddTcienza media/’ 
Mellon , a ol 0, n. 1. 100!, 

(047) or FiM ’i ii, 15.. and ]*A( ti tt.o, l . “Caleolo df 11addleien/a media/' Mctron. vol. 8, 
n. 0, 1000. 

(OtK) di, \ i lu.ni iim. lM.. Itetaziom Jut <jh indai di vanuhihlh dot fenonicm ndhttivi 
comf>ush e (picth dei fenontetu (olhfiiri sentjdai: 1'ailli, Itomu. 1000. 

(01<0) Gat a am, Ti.. “Contributi alia delemunazionc det»b mdiei di vaimbilita pei aleuni 
hpi di dist ribu/iono," Mitrnn, m> 1. 0, n. 1, 1001. 

(050) (Ju.vini, L., “Sidle rune di eonet nlia/ione iclnlive a eaiatlen limitati e non 
limilati/' Mihun , vol. 10, n 0, 1002. 

(051) Gim, C., “\nmbihta e Mufabilda, eontribufo alio studio delle di.stribu/.ioni e 
lela/ioni statist lobe,” Shall E<ononiico-(inn tdm della ft. Iniveisitd di ('uifhari, 
1012. 

(052) (Jim, C., “ Indiei di conceal razione v di dipenden/a," li ihlioteca delV Eionomista, 
5“ sene, 1010. 

(050) Gim, ('., “Sulla tnisura della concentra/umr e della vanabibta dei earalteri/' 1 
Ath ihl It. ]sfilntu I ineto di Seienze , fa tine ed Aiti. 101 1. 

(054) Gim, (’., “11 eonectio di liansvima/.ione c le sue pninc apphonzioni/’ (hamate 
dnth Economistt e ttivisla di Statistical 1010. 

(055) Gim, ( ., “ Di una <‘stensione del concetto <1 1 seostamento medio e di aleune a]>])li- 
ea/iom alia misma della vanabilita di earalten quablativi/' Ath del It. fstituto 
Venelo di Setnize , Lelteie ed Artu 1018. 

(050) (Jim, C., “Sul massimo deop imlm di variabdila assoluta e sidle sue applieazioni 
aob mdiei di vanabibta iclativa e al ra]»poito di concentrazioue/’ Mctron , vol. 
8, n. 0, 1000. 

(057) Gini, C., “Intorno allc curve di concentrazione/’ Mctron , vol. 0, n. 3 4, 1032. 



528 THEORY OF STATISTICS. 

(658) Gim, C., “Sull’ influenza obe il ruggruppamento delle singole modalita esereita 

sul valorc di aleuni mdici statistici nel caso di serie seoimesse,” Metron , 
vol. 12, n. 4, 1936. 

(659) Pietra, G., Appunti intorno alia nnsura della variability e della conecatrazione dei 

caratferi; Rcrtcro, Roma, 1915. 

(600) Pietra, G., “Delle relaziom Ira gli indiei di variability,” Atti del It. fsfifnto Vencto 

di Scicnze, Letteie cd Arti, 1914 15, Paiti 1 **11. 

(601) Piktha, G., “Intorno alia diseordanza tra gh tndu*i di variability e di eonecntra- 

zioue,” XX f I Session? dell " Jstitutn Inlernazionale di Statistical Londra , 19114. 
(002) SavoiUjJ\ an, F., “Intorno all’ approssimu/.ione ill aleuni indiei della distribuzioue 
dei redditi,” Atli del It. Istilutn Vencto di Seienze, Leitere ed . irti 1915. 

(660) Vinci, F., “Sui eoeflieienti di vaiiabilita,” Mellon, vol. 1, n. 1, 1920. 


Index-numbers and Other Statistical Measures. 

(604) Gini, 0., " lidorno al metodo dei residui dello Stuart Mill,” Stndi Economical 
(hundiei della 11. Lnnersita dt ( i a<*hari, 1910. 

(005) Gini, C., “Qudques considerations au wsujet dc la construction des nonibrcs 
indices des prix et des questions analogues. Contribution ii 1’etudo des 
rnelliodes dYliminution,” Melton, \oJ. 3, n. 1, 1921. 

(660) Gini, C., “On the Circular Test ot Index-numbers," Met ion, vol, 9, n. 2 1901. 
(66?) Gim, C., “Ta\ole cb mortalita della popola/ione ltaliana” (in eollubora/.ione eon 
L. Calvaiu), Annah di Slatishca, Serie 6, \ol. 8, 1901. 

(068) Gini, (., “Sur line nwthode pour determiner le jmmhre moyen cli\*> eid'ants 
legitimes par manages," Jtevae di VJnslituf Jut* i nalioital d< Stall,slnptc, 1931. 

(669) Gim, "Sur la unsure de l.i leeondile des manages,” Itnlleim de Vfnstttut 

Intei national de Stativtirpa , 193L 

(670) Gim, (’., "On a Method foi Calculating the Infantile l)<ath-iate according to 

the Month of Death,” llevue de VlnsUtut International de Statistnpiv , 19111, 

(671) Gim, C., “Su la deternhnazione dei quo/ienti di eliminazione e in particular** sui 

metodi dell** durale csuite e delie durate medie nella lpotesi di saggi istantanci 
di eliminazione cost anti.” Metron , vol. 12, n. 3, 1905. 

(672) Gini, "Methods of hlmmialmg the lnthienee of Several Groups of Factors,” 

Ecomnnctucu , January 1907. 


Statistical Relations. 

(670) Gim, C., "Di una misura d* 11a di->somigJian/u lia due griippi di quantita e delle 
sue appheu/ioin alio studio <l**lle rela/ium slat 1 stlehe, \ itti del It. Istitnto 1 cneto 
di Seienze, Lett ere cd Aiti, 191 t. 

(674) Gim, C., “Nuovi eontribuh alia teona <lell<* relaziom statishebe," Atti del It. 

Istitnto Vencto dt Seienze. Lelfm ed Ailt, 1915. 

(675) Gim, C., “Indiei di omofdei <* dj rnssoiniglian/n e loro lela/iom col eoetbeiente di 

eorrelazione <* con gh indiei di attra/ione,” Alii dt l It. Istitnto Vencto dt Scicnze , 
Leitere cd , irti , 1915. 

(676) Gini, ('., "Sul enterio di concord,mzu ha <lu<* f arntLeri,” Atti det It. Istitnto 

Vincio di Seienze , Letltie cd Aiti, 1910. 

(677) Gini, (\, “Indiei di eoneoidan/a,” Atti (hi It, istitnto Vencto di Seienze , Letterc 

ed Arti, 1916. 

(678) Gim, C., "Sulle relaziom tra Je mlensita oograduate di due caralten.” Atti del It. 

Istitnto Vencto dt Seienze , LetUre ed Arti , 1917. 

(679) Gini, C„ “Still’ intluenza (lie il raggruppamenlo delle singole modalita esereita 

sul valorc di aleuni indiei statistic! nel easo di sene seonnesse," Metron , vol. 12, 
n. 4, 1906, 

(680) Pjeiha, G.. " The Theory ol Statist leal Relations, with Special Reference to 

Cyclical Senes,” Metron, vol. 4, n. 3-4, 1925. 



TiKJFEKENCKS 


529 


REFERENCES SUPPLEMENTARY TO THE 
ELEVENTH EDITION. 

Since the publication ol* the eleventh edition seveial of the books 

listed among the foregoing references have passed into further editions. 

The principal items an* as follows:— 

(081) Sir W. Pat-in Ki nnriox's Fici/ncniy Cuivcs and ('an elation, refs. (10), (148) 
and (100), is now in its Ord ed.. 1908, and is published by the Cunihtidgc 
f bn\ ersil v Pi ess 

(082) It A. Pi milk's Statistical Methods for Reseat eh Workers, ref (01), appeared in 
n Till ed. in 1 988. 

(088) L. II ('. Tieeiaj’s The Methods oj Statistics, ref. (09), appeared m a 2nd ed, 
in 1937. 


Tin two following are now available in English translations:— 

((>8 1) It. \<>n Minis' book, rel. (10), as Riolndnldy, Statistics amt Truth , William 
Ilodno. 1989. 

(08.)) A r ts( m crow's bonk, ref. (28), as The Matin nmticid Thconj of (\it i elation , 
With,mi J lodge, 1989. 

To the references on Piobabihty <>n page* 105 should be added: 

(080) Port l, K., Tiaitc da ('ideal des 1'iobubdites, a sera s ol bioebures written under 
the general editorshij) of M. Hotel and published by (hint luei-V lllars, Paris. 

(0tS7) CnwiriL II , Random 1 anabb s and Piobabihty J hslubithons, Cambiidge t'mvei- 
sit > lb ess, 1937. 

(088) I-l\ ^, P., ('ideal dcs Piobabddcs , Gautbiei Villais, 1925. 


To the refcicnees to tables on pa^r 525 should be added: 

(089) l)v\in, P. JS., Tables oj the Cm i elation Coi (fit lent , Htometiika Olliee, tbmersity 
(’olle<>e, Pondon, 193,8. 

(090) Pi mi I- ii , It. A., and N \irs, P, Statistical Tables for Riologualy Agiiudluial 
and Medical Reseaieh ()li\er A. Ho\d, 1988. 

(091) Ivllllv, T. P., The Kelley Statistical Tables, Macmillan, J 938. 


81 




APPENDIX TABLES 


531 


APPENDIX TABLE 1. 


Normal Curve. Ordinates of the Normal Carve of Knars of Unit Area at every Tenth of 
the Standard Deviation , with First and Second Diffeienees. The value of the central 
ordinate at zero is 1/V27r. 


rjer. 

y- 

A’(-). 

A 2 . 

xjn. 

.V* 

A'( -). 

A 2 . 

0*0 

0*39894 

199 

- 392 

2-5 

0 01753 

395 

+ 79 

0*1 

-39695 

591 

-374 

2-6 

01358 

316 

+ 66 

0*2 

•39104 

965 

-347 

2 7 

01042 

250 

+ 53 

0 3 

•38139 

1312 

- 308 

2*8 

•00792 

197 

+ 45 

0*4 

•36827 

1620 

- 265 

2*9 

00595 

152 

+ 36 

Of) 

•35207 

1885 

-212 

3-0 

•00443 

116 

+ 27 

0 0 

*33322 

2097 

- 159 

3 1 

00327 

89 

+ 23 

0 7 

31225 

2256 

-101 

3 2 

00238 

66 

+ 17 

0-8 

•28969 

2360 

- 52 

3 3 

OOJ72 

49 

+ 13 

0 9 

26609 

2412 

0 

3*4 

00123 

36 

+ 10 

1 0 

•24197 

2412 

H 46 

3*5 

•00087 

26 

+ 7 

1 1 i 

•21785 

2366 

+ 84 

3 6 

•00061 

19 

I + 6 

1 2 

19419 

2282 

4 118 

3*7 

•00042 

13 

i + 4 

1-3 

, 17137 

2164 

f 143 

3-8 

| *00029 

9 

+ 2 

J 4 

•14973 

2021 

1 161 

3 9 

•00020 

7 

+ 3 

l f> 

•12952 

1860 

+ 173 

40 

•00013 

4 

_ 

1*6 

*11092 

1687 

\ +177 

41 

00009 

3 


1*7 

1 09405 

'* 1510 

+ 177 

42 

•00006 

2 

— 

1*8 

•07895 

1333 

+ 170 

4 3 

•00004 

0 

— 

1 9 

06562 

1163 

+ 162 

4 4 

•00002 

— 

— 

2 0 

05399 

1001 

+ 150 

4 5 

•00002 

„ 

_ 

2 1 

04398 

851 

1 137 

4 0 

•00001 

i 

I — 

2 2 

•03547 

714 

4 120 

4-7 

•0000J 

- 

— 

2 3 

•02833 

594 

4 108 

4 8 

•00000 


— 

2*4 

•02239 

486 

4 91 






Precision of Interpolation.— Owing to tlu* magnitude ot* the second differences, 
simple interpolation near the beginning of the table may give an error up to 
in the fourth place ; the use of second differences will bring this down to 1 or 
in the last place, third differences being small. Where third differences are 
greatest, in the neighbourhood of -0*0, the error may be as large as 3 in 
the last place unless the third difference is used. 


tv V? 








532 


THEORY OF STATISTICS. 


APPENDIX TABLE 2. 


Normal Curve. The Proportion, J, of (he Whole Area of the Nonnaf Carve hfuiii to the 
Left of the Ordinate at Deviation .t a, tabulated at every 7 \ nth of the Standard Deviation , 
with First and Second Differences. 


xfa. 

A. 

A*(+). 

A*(-). 

xfa. 

A. 

A‘( i). 

A z ( - )• 

00 

o 50000 

3983 

40 

2-5 

0-99379 

155 

36 

0 1 

•53983 

3943 

78 

2 6 

•99534 

119 

28 

0*2 

•57926 

3865 

114 

2-7 

•99653 

91 

22 

03 

•61791 

3751 

147 

28 

•9974 1 

69 

17 

04 

•65542 

3604 

175 

29 

99813 

52 

14 

0-5 

•69146 

3429 

200 

3 0 

•99865 

38 

10 

0f> 

•72575 

3229 

219 

3 1 

99903 

28 

7 

07 

•75804 

3010 

230 

3 2 

99931 

21 

7 

0 8 

•78814 

2780 

240 

3 3 

•99952 

14 

3 

09 

•81594 

2540 

211 

3 1 

•99966 

11 

4 

10 

•84134 

2299 

239 

3 5 

•99977 

7 

_ 

11 

•86433 

2060 

233 

3 6 

99984 

5 


1-2 

•88493 

1827 

223 

3 7 

•99939 

4 


, 1 3 

•90320 

1604 

209 

3 8 

•99993 

2 


1*4 

•91924 

1395 

191 

3 9 

•99995 

2 

— 

1 5 

•93319 

1201 

178 

4 0 

•99997 

1 


1*6 

•94520 

1023 

159 

4 1 

99998 

1 


1*7 

•95543 

864 

143 

4 2 

•99999 

— 


1*8 

•96407 

721 

124 

43 

•09999 


- 

1-9 

•97128 

597 

108 

4-4 

•99999 

— 


20 

•97725 

489 

93 





21 

•98214 

396 

78 





22 

•98610 

318 

66 





2-3 

•98928 

252 

53 





24 

99180 

199 | 

44 






A attains the e\aet \alue 0 99999 between 4 20 and 1 ‘27. 


Precision of Interpolation. - Sun]>le interpolation max lead to an error of .3 
or 4 at most in the fourth place of decimals in tin* region where second differences 
are large ; the use of the second dilferenee will bring this down to 2 or 3 in the 
last place, the largest errors tending to occur at the beginning of the table, where 
the third dilferenee may be used if the greatest possible precision is desired. 






APPENDIX TABLES 


533 


APPENDIX TABLE 3. 

Normal Curve. The Probability, P, of an Observation lying Outside the Limits ±xia in 
the Normal Curve of Knots. P ~ 2(1 - A), zvhere A is the area given by the preceding 
table. 


xja. 

P. 

Ah~). 

A 2 ( 1 ). 

xja. 

p. 

A*( - )■ 

A 2 ( + ). 

00 

1-00000 

7966 

80 

2-3 

•01242 

310 

71 

0*1 

0 92031 

7886 

136 

2 6 

-00932 

239 

57 

0 2 

■84148 

7730 

228 

2 7 

00693 

182 

44 

0 3 

■70418 

7302 

291 

2 8 

00311 

138 

33 

04 

68916 

7208 

331 

2 9 

00373 

103 

27 

0 a 

*61708 

6837 

399 

3 0 

00270 

76 

19 

0 0 

•348,71 

6138 

436 

3 1 

00191 

37 

17 

0-7 

48393 

6022 

463 

3 2 

00137 

10 

10 

0 K 

42371 

3379 

178 

3 3 

00097 

30 

10 

0 0 

30812 

7< )S 1 

183 

3 1 

00067 

20 

r > 

10 

31731 

4398 

179 

3 3 

00017 

13 

_ 

M 

27133 , 

1119 

463 

3 6 

* 00032 

10 


1 *2 

23014 , 

3631 

113 

3 7 

, 00022 

| 8 

- 

1 3 

' -19360 

3209 

419 

3 8 

1 00011 

4 


1 1 

' 16131 

1 

2790 

i 

389 

3-9 

, 00010 

j 4 

— 

i r> 

1336] i 

2101 

331 

4 0 

1 00006 


_ 

1 0 

10960 1 

2017 

320 

1 1 

00001 

1 1 


1 7 

! 08913 1 

1727 

284 

1 2 

00003 i 

1 

I _ 

1 s 

07186 

1 113 

270 

1 3 1 

00002 1 

1 

- 

I 9 

03743 1 

1 193 

210 

1 4 

1 7 

00001 ! 
0< H Mil 



2 0 

01330 | 

977 

187 





2 l 

03373 i 

792 

130 

! 




2 2 

02781 

030 

131 





2 3 

02113 

307 

107 





2 4 

•01610 

1 

398 

88 



l 

1 



P attains the exact value 0 OOOOl between 1 41 and 4 12. 

Precision of Interpolation .—Simple interpolation may lead to errors of 5 
or (> in the fourth place of decimals where second deferences are large. Using 
second differences as well, the error will not exceed about 5 in the last place, 
near the lx ginning of the table, where the third dilJerence may be brought in 
if desired. 








584 


THEORY OF STATISTICS 


APPENDIX TABLE 4A. 


Values of the , 2 Integral for One Degree of Freedom for Values of y 2 
from y 2 ~0 to y 2 *-1 by steps of 0-01. 


X * 

P 

A 

X* 

P 

A 

0 

l ooooo 

7966 

0*50 

0*47950 

43 6 

o-oi 

0*92034 

3280 

0*51 

0*47514 

430 

0*02 

0*88754 

2505 

0*52 

0 47084 

423 

0 03 

0*86249 

2101 

0-53 

0*46661 

418 

0-04 

0*84148 

1842 

0*54 

0*46243 

411 

005 

0 82306 

1656 

0*55 

0*45832 

406 

0*06 

0 80650 

1516 

0*56 

0*45426 

400 

0*07 

0*79134 

1404 

0 57 

0*45026 

395 

0*08 

0*77730 

1312 

0*58 

0*44631 

389 

0 09 

0 76118 

1235 

0 59 

0 44242 

384 

0 10 

0*75183 

1169 

0*60 

0*43858 

379 

0 11 

0*74014 

mi 

0*61 

O 43479 

374 

0 12 

0*72903 

1060 

0*62 

0*43105 

369 

0 13 

0 71843 

1015 

0 63 

0 42730 

365 

0*14 

0 70828 

974 

0 64 

0*42371 

360 

0 15 

0*69854 

938 

0*65 

0*42011 

355 

016 

0 68916 

905 

0*66 

0*41656 

351 

017 

0*68011 

874 

0*67 

0*41305 

346 

0 18 

0*67137 

845 

0*68 

0*40959 

343 

0*19 

0*66292 

8*20 

0*69 

0*40616 

338 

0*20 

0 65472 

795 

0*70 

0 40*278 

334 

0*21 

0*64677 

773 

0*71 

0 39944 

330 

0*22 

0*63904 

752 

0 72 

0*39614 

326 

0*23 

0*63152 

731 

0*73 

0*39288 

322 

0*24 

0 62421 

713 

0*74 

0*38966 

318 

0*25 

0*61708 

696 

0*75 

0 38648 

315 

0*26 

0 61012 

679 

0*76 

0*38333 

811 

0*27 

0*60333 

663 

0*77 

0*38022 

308 

0 28 

0*59670 

648 

0 78 

0*37714 

304 

0*29 

0*59022 

634 

0*79 

0 37410 

301 

0*30 

0 58388 

620 

0 80 

0*37109 

297 

0 31 

0*57768 

607 

0*81 

0*36812 

294 

0*32 

0 57161 

595 

0 82 

0*36518 

291 

0*33 

0 56566 

583 

0*83 

0 36227 

287 

0 34 

0*55983 

572 

0*84 

0 35940 

285 

0 35 

0 55411 

560 

0*85 

0*35655 

281 

0*36 

0*54851 

551 

0 86 

0 35374 

278 

0 37 

0*54300 

540 

0 87 

0*35096 

276 

0‘38 J 

0*53760 

630 

0 88 

0 34820 

272 

0*39 j 

0*532 10 

521 

0 89 

0*34548 

270 

0 10 I 

0*52709 

512 

0 90 

0*34278 

267 

0*41 

0*52197 

503 

0 91 

0 34011 

264 

0*42 

0 51694 

495 

0*92 

0*33747 

261 

0*43 

0*51199 

487 

0*93 

0*33486 

258 

0*44 

0*60712 

479 

0*94 

0*33228 

256 

0*45 

0*50233 

471 

0 95 

0 32972 

253 

0*46 

0*49762 

463 

0*96 

0*32719 

251 

0*47 

0*49299 

457 

0*97 

0 32468 

248 

0*48 

0 48842 

449 

0 98 

0 32220 

246 

0*49 

0*48393 

443 

0*99 

0*31974 

243 

0*50 

0*47950 

436 

1*00 

0*81781 

241 




APPENDIX TABLES 


535 


APPENDIX TABLE 4B. 


Values of the y 2 Integral for One Degree of Freedom for Values of y 2 
from x 2 ~ / to /f —10 by steps of 0 1. 


X* 

P 

A 

X* 

P 

A 

1-0 

0*31731 

2304 

5 6 

0*01902 

106 

1 1 

0 29427 

2095 

5*6 

0 01796 

99 

1 2 

0*27332 

1911 

5*7 

0*01697 

94 

1 *3 

0*25421 

1749 

5 8 

0*01603 

89 

11 

0*23672 

1605 

6*9 

0*01514 

83 

1 6 

0 22067 

1477 

6*0 

0*01431 

79 

1 0 

0 20590 

1361 

6 1 

0*01352 

74 

1 7 

0*19229 

1258 

6 2 

0’01278 

71 

1 8 

0 17971 

1163 

6 8 

0*01207 

66 

1 *9 

0*16808 

1078 

6 1 

0*01141 

62 

‘2 0 

0*15730 

1000 

0*5 

0*01079 

59 

‘2 1 

0*14730 

929 

6 *6 

0*01020 

56 

2’2 

0 13801 

861 

6 7 

0*00964 

52 

2 3 

0*12937 

803 

6 8 

0*00912 

50 

2*4 

0*12134 

719 

6 9 

0 00862 

47 

2*5 

0*11385 

099 

7 0 

0 00815 

44 

2 (5 

0*10686 

651 

7*1 

0 00771 

42 

2*7 

0 1 O 035 

609 

7*2 

0 00729 

39 

2*8 

0*09420 

568 

7 3 

0 00690 

38 

2 9 

0*08858 

532 

7*4 

0 00662 

35 

3 0 

0*08326 

497 

7*5 

0*00617 

33 

3*1 

0 07829 

465 

7*6 

0*00584 

32 

3*2 

0*07364 

436 

7 7 

0*00552 

30 

3*3 

0*00928 

408 

7*8 

0'00522 

23 

3*1 

0 06520 

383 

7*9 

0 00494 

26 

3 5 

0*06187 

359 

8 0 

0 00468 

25 

3*6 

0 05778 

337 

8*1 

0*00413 

24 

3 7 

0*05441 

316 

8 2 

0*00419 

23 

3 8 

0*05125 

296 

8 3 

0 00396 

21 

3 9 

0*04829 

279 

8 4 

0*00375 

20 

4 0 

0 04550 

262 

8 5 

0*00355 

19 

4*1 

0*04288 

216 

8 6 

0*00356 

18 

4*2 

0-04042 

231 

8 7 

0 0031 « 

17 

4*3 

0*03811 

217 

8 8 

0 00301 

16 

4 4 

0*03591 

2or > 

8 9 

0 00285 

15 

4 *5 

0*03389 

392 

9 0 

0 00270 

14 

4*6 

0 03197 

181 

9 1 

0*00256 

14 

4 7 

0*03016 

170 

9 2 

0 00242 

13 

4*8 

0*02846 

160 

9 3 

0*00229 

12 

4 9 

0 02686 

151 

9*4 

0*00217 

12 

5 0 

0 02535 

142 

9*6 

0*00205 

10 

5*1 

0*02393 

134 

9*6 

0*00195 

11 

5*2 

0*02259 

126 

9*7 

0 00184 

10 

5 3 

0 02133 

119 

9 8 

0*00174 

9 

5*4 

0*02014 

112 

9*9 

0*00165 

8 

5*5 

0*01902 

106 

10 0 

0 00157 

8 





APPENDIX 

/-Table. The Proportion of the Area of the Curve y — -—- —-j of Unit Area lying to 



0 to 6 , and for values 

(Condensed to three figures from the four-figure tables by “Student” in Metrou , 

“Student,” who has also very kindly supplied 


t. 

r — 1. 

2 . 

3. 

4. 

5. 

6 . 

7. 

8 . 

0. 

10 

0 

0*500 

0*500 

0 500 

0 500 

0 500 

0-500 

0 500 

0 500 

0 500 

0*500 

0-1 

*532 

•535 

*537 

537 

538 

538 

538 

539 

539 

•539 

.o 

•563 

•570 

573 

•574 

575 

•576 

576 

■577 

•577 

•577 

•3 

•593 

•604 

•608 

•610 

•612 

613 

61 1 

61 4 

-611 5 

615 

•4 

•621 

•636 

•642 

•645 

•647 

618* 

6 10’ 

•650 

*651 

•651 


■618 

■667 

•674 

678 

-681 

683 

681 

685 

•685 5 

•686 

•fi 

•672 

*605 

*705 

710 

*713 

•715 

716 

717 

•718 

•719 

•7 

•604 

•722 

•733 

•730 

*742 

•715 

717 

7 18 

749 

*750 

•8 

•715 

•746 

•750 

766 

•770 

773 

•775 

777 

•778 

•779 

•9 

*733 

*768 

•783 

790 s 

705 

799 

801 

803 

801 

•805 

1 0 

•750 

•780 

804' 

813 

•818 

•822 

•825 

•827 

828 

830 

l J 

765 

•807 

•824 

*833 r ’ 

830 

•813 

•816 

*818 

850 

•85! 

1 * 4> 

•770 

•823 5 

842 

852 

*858 

•802 

865 

868 

870 

•871 

1-3 

701 

838 

•858 

868 

875 

879 

883 

885 

8S7 

•889 

1-4 

•803 

*852 

•872 

•8 S3 

•800 

SOI * 

808 

000 " 

002 ’ 

<♦01 

1*5 

•M3 

864 

885 

806 

•003 

< M >S 

Oil 

01 1 

016 

918 

1 fi 

•822 

•875 

806 

008 

015 

02 < > 

025 

026 

•92S 

<♦30 

1 7 

•831 

884 

*906 

018 

025 

950 

933" 

•036 

038 

•940 

1-8 

830 

•803 

015 

*027 

05 1 

•030 

043 

015 

947 

•9 49 

10 

•846 

001 

023 

•035 

042 

017 

050 

<♦53 

055 

9,7 

2 0 

•852 

•008 

•030 

012 

040 

054 

057 

060 

962 

*963 

2 L 

858 5 

•915 

037 

018 

055 

960 

06*1 

96*5 ’ 

067 

969 

2*2 

•864 

021 

042 

031 

060 J 

965 

968 

070' 

972 

974 

2 3 

•860 r * 

926 

0 I7 f * 

9,8'’ 

065 

069 

0 72 ’ 

075 

•076- 

978 

2 4 

•871 

031 

052 

063 

060 

973 

979 

078 

080 

981 

2 5 

870 

035 

*056 

067 

073 

077 

979* 

081 ’ 

083 

981 

2(5 

*883 

•030 

060 

970 

076 

980 

0 X2 

08 4 

086 

087 

2 7 

•887 

013 

063 

073 

070 

082 

OS, 

986 & 

088 

989 

2 8 

891 

•046 

•066 

076 

OKI 

081 

087 

088 

900 

•991 

2 0 

804 

•010 

' 060 

078 

; 083 

086 ' 

OSS- ! 

090 

991 i 

992 

3 0 

•808 

•052 

•071 

980 

OS.5 

•088 

990 | 

001 ‘ 

092" 

093 

31 

•001 

055 

: 073 

982 

087 

| 080 | 

001 

003 

00 1 

99 1 

3 2 

004 

057 

! 075 

083'* 

088 

991 

j 002' 

001 

095 

995 

3*3 

006 

•060 | 

! 977 

085 

! 080 | 

! 092 i 

993 : 

005 

•005 

996 

3 1 

0 O0 

062 ! 

i 070 

086 

•<♦06 , 

993 

<♦01 

005 1 

996 

997 

3-5 

Oil 

061 

080 

OSS 

001 , 

001 

1 995 : 

006 

997 

997 

3-6 

•014 

065 

082 

980 

002 ! 

9< > 1 

j 906 i 

906'’ 

997 

998 

3 7 

•016 

•067 

083 

000 

•003 ! 

0 O.» 

I 006 

007 

997 s 

998 

3-8 

•018 

060 ! 

•084 

000 

004 ! 

095'’ 

007 

007 

998 

90S 

3*0 

! *020 

•070 

•985 

001 j 

004 

996 

<><»7 

008 

*998 

•90S 5 

40 j 

02 4> 

•071 

•086 

002 

005 

996 

007 

008 

998 

999 

4*1 ! 

*024 

•073 

*087 

•003 

005 

997 

90S 

<908 

999 1 

990 

4 2 | 

•026 

074 

088 

003 

096 

007 

908 

90S 5 

099 

*999 

4 3 

027 

075 

•088 

004 

006 

007* 

008 

909 

•999 

•999 

4 1 j 

*029 

076 

089 

094 

996* 

008 

008 

009 

099 

•999 

4*5 

•030 

077 

1 900 

095 

007 

008 

900 

909 

999 

999 

4 6 

*032 

078 

000 

•005 

007 

008 

009 

•009 

•999 

*999 r ’ 

4 7 

033 

•070 

•001 j 

005 

007 

•90s 

009 

999 

999 

1 000 

4*8 

035 

•080 

ooi ! 

•006 

098 

99S‘ J 

•009 

909 

999 f ’ 


4 0 

•036 

080 

•002 

•006 

*098 i 

•000 

090 

099 

1000 


5*0 

*037 

081 

*002 

006 

•998 

009 

*900 

999 s * 



5*1 

038 

082 

•903 

096 r * 

*008 

•090 

990 

•999 s 



5*2 

■030 5 

■982 r> 

•003 

007 

•098 

000 

•090 

1 000 



5*3 

■041 

*083 

•003 

007 

•908 

000 

*990 




5 4 

942 

*084 

•904 

007 

008 5 

•090 

•099 6 




5 5 

•043 

•984 

•004 

007 

000 

000 

•900' 




f>*6 

044 

•0S5 

*994 

007 B 

090 

900 

1 000 




5*7 

*945 

•085 

•005 

•90S 

•009 

•000 





5*8 

*046 

•986 

*095 

008 

•900 

•000 


















TABLE 5. 

the Left of the Ordinate of Deviation 1, for values of t proceeding by intervals of O'l from 


of v fiom 7 to 20, 

vol. 5, 1925, and published by permission of the proprietors of Metron and 
a few eorreclions to the mi^inal tables.) 


/. 

11. 

12 

13. 

H 

~ 15. 

16. 

17. 

18. 

19 

20 1 

- 

0 

0-500 

0-5O0 

0-500 

0 500 

0 500 

0 500 

0 500 

0 500 

0*500 

0 500 

0*1 

•539 

•539 

•539 

•539 

539 

•539 

539 

539 

•539 

•539 

2 

577 

-578 

•578 

578 

578 

578 

578 

•5 78 

578 

57 « 

:i 

•015 

015 

•6I5 6 

•010 

•010 

010 

•616 

•010 

•616 

010 

•4 

•052 

•052 

•052 

052 

053 

•053 

*653 

•053 

*653 

•053 

5 

GKO* 

•087 

•087 

088 

088 

088 

688 

•688 

•689 

•689 

•0 

-720 

•720 

•721 

721 

721 

721* 

•722 

722 

*722 

■722 

*7 

•751 

•751 

•752 

-752 

75.3 

753 

•753 

•754 

•754 

754 

8 

-780 

•780 

•781 

78 J 5 

782 

•782 

•783 

783 

•783 

783 

9 

800 

•807 

808 

•808 

809 

809 

•810 

810 

•810 

811 

!•() 

♦831 

•831* 

832 

833 

833 

•834 

834 

835 

•8.35 

•835 

1 1 

-853 

853'* 

*85 1 

855 

8,»0 

•850 

•857 

•857 

857* 

•858 

1 2 

•872 

873 

•874 

875 

870 

870 

877 

877 

878 

•87 8 

J-3 

890 

891 

892 

893 

893 

891 

891* 

895 

895 

890 

1 1 

905* 

9o7 

OOP 

90S 

909 

910 

•910 

911 

911 

•912 

1 5 

9 i 9 

920 

921 


92.3 

92.3* 

•921 

921 * 

925 

•925 

1 0 

931 

032 

933 

934 

933 

935 

930 

930* 

9.37 

937 

1 7 

911 

913 

913* 

911 

915 

940 

•940 

947 

917 

•948 

1 8 

950 

951 * 

952* 

953 

951 

955 

955 

956 

950 

950* 

1 9 

958 

050 

900 

901 

902 

902 

•90.3 

963 

90 4 

•904 

2 0 

905 

900 

907 

907 

908 

909 

909 

970 

970 

970 

2 1 

970 

071 

972 

973 

973* 

•971 

974* 

975 

•975 

•970 

22 

975 

070 

977 

977 

97.8 

979 

979 

979 

•980 

980 

I 2 3 

979 

080 | 

981 

9SI 

982 

•982 

983 

9S.3 

983* 

984 

1 2 t 

982 

083 

981 

985 

985 

•985* 

980 

980 

•987 

987 

l 2 5 i 

985 

| 080 

987 

987 1 

9SS 

988 

•988* 

989 

*989 

989 

2 0 ! 

988 

•088 

: 9S9 ! 

; 9,89* 1 

990 

990 

•991 

991 

991 

991 

i 27 | 

990 

000 

! 991 ; 

991 

902 | 

992 i 

i 992 

993 

993 

993 

I 2 8 

, -991 

002 

092' 

993 

993 

99 1 

1 99 4 

994 

994 

994 6 

1 2 9 I 

1 993 

! 003 1 

991 i 

99 1 

99 P I 

1 994* 

995 

995 

995 

990 


! -994 

09 1* 

995 j 

; 995 i 

! -90,,* 

990 ! 

! 990 

•990 

990 

990* 

I 31 

995 

005 

•990 1 

990 ] 

990 

997 

' 997 

•997 

997 

997 

3 2 

990 

090 

•990* ' 

997 

997 

| 997 

i 997 

997* 

998 

998 

3 :i 

990* 

097 

997 

997 

998 

998 

998 

998 

•998 

•998 

3 4 

997 

-997 

-998 

J 998 

i 99 S 

998 

•998 

998 

998* 

999 

3 r> 

997* 

; -008 

998 

998 

998 

998* 

•999 

| -999 

999 

999 

3 6 

•998 

998 

998 

•999 

•999 

•999 

999 

•999 

•999 

999 

3 7 

998 | 

998* 

•999 

999 

999 

999 

999 

999 

•999 

999 

3 8 

998* 

999 

999 

! 999 

999 

•999 

999 1 

999 

999 

999 

3 9 

999 

999 

999 

999 

999 

999 

999 

999* 1 

•999* 

1 000 

4 u 

999 

999 

999 

•999 

999 

999* ( 

999* ' 

1 000 i 

1 000 


4-1 

999 

•090 

999 

999** 1 

999* . 

1 000 

1 000 




4 2 

999 

000 

999* 

1 000 

] 000 1 






4-3 

999 

•ooo r> ! 

1 000 


, 






4-4 

999* 

1 000 , 









4 5 

999" | 










4-0 

1 000 j 











Note .—The methods by which “Student” calculated the Metron tables are 
explained in notes by him and It. A. Fisher in that journal, vol. 5, part 3, 1925, 
pp. 18 24. The four figures of those values have been rounded up to three in 
the above table, except when the four-figure value concluded with a 5, in which 
ease it is shown in full. In columns in which values greater than 0-9995 occur 
the first is written 1-000 and the remainder left blank. 





















538 


THEORY OF STATISTICS 


APPENDIX TABLE 6A. 

(Reproduced by kind permission of Prof. R. A. Fisher and Messrs Oliver & Boyd 
from the former’s “ Statistical Methods for Research Workers .”) 

5 Per Cent. Points of the Distribution of z. 







Values of i'j. 






1 . 

2. 

3. 

4. 

5 

0 . 

8 . 

12 . 

24. 

on 


1 

2*5421 

2 0479 

2 0870 

2 7071 

2 7191 

2 7270 

2 7380 

2 7484 

2 7588 

2 7093 


2 

1 1592 

1 4722 

1 1705 

1-4787 

1 4800 

1 1808 

1 1819 

1 *4830 

1 1810 

1 4851 


3 

11577 

11284 

1*1137 

1 *1051 

1*0991 

1 *0953 

1 0899 

1 08 42 

1 0781 

1 0710 


4 

1 0212 

•9090 

•9429 

•9272 

9108 

*9093 

•8993 

8SS5 

87(57 

8039 


r> 

•9441 

8777 

•8141 

8230 

8097 

7997 

7802 

*7714 

7550 

7308 


c> 

•8918 

•8188 

•7798 

•7558 

•7391 

•7274 

7112 

•0931 

0729 

0499 


7 

•8600 

•7777 

•7347 

•7080 

0890 

•0701 

•0770 

0309 

0131 

•5802 


8 

•8355 

•7175 

•7014 

•0725 

0525 

0378 

•0175 

•5945 

50S2 

•5371 


0 

•8163 

7242 

•0757 

0450 

•0238 

0080 

*5802 

5013 

5321 

•1979 


10 

•8012 

7058 

•0553 

•0232 

0009 

5813 

503 J 

•5310 

5035 

•4057 


11 

•7889 

•0909 

•0387 

•0055 

5822 

•5048 

5400 

5120 

•4795 

•1387 


12 

•7788 

0780 

0250 

5907 

5000 

5187 

5234 

•49 41 

4592 

•4150 


13 

7703 

•0082 

0134 

•5783 

5535 

5350 

•5089 

1785 

•4419 

3957 


14 

•7630 

•0594 

•6036 

*5077 

•5423 

•5233 

•mu 

•40 49 

4209 

•3782 

A, 

15 

•7508 

•0518 

5950 | 

5585 

•5320 

•5131 

48.V. 

4532 

•4138 

•3028 

-M 

10 

•7514 

0451 

5870 ! 

551 >5 

5211 

•50 42 

•4700 

' *4128 

| 4022 

3190 

o 

17 

•7460 

0393 

•5811 

5134 

5100 

•1901 

4070 

1 *4337 1 

1 3919 

•3300 

cu 

18 

•7424 

6341 

•5753 

•5371 

•5099 

j 4894 

•4002 

1 *4255 ; 

1 3827 

j *3253 


19 

•7380 

0295 

•5701 

•5315 

5040 

! -1832 

1535 

•4182 

3743 

•3151 

K- 

20 

7352 

! -0254 

•5054 

5205 

*4980 

| 1770 

•4174 

•4110 j 

•3008 

3057 

1 

21 

| 7322 

•6210 

•5012 

! *5219 

•1938 

•4725 

•1420 

4055 

i 3599 

2971 


22 

7294 

0182 

5574 

•5178 

4891 

1079 

•4370 

•4001 

! 3530 

•2892 


23 

•7209 

j *015] 

•5540 

*5110 

4874 

*4030 

1325 

3950 

•3478 

2818 


21 

*7240 

•0123 

•5508 

•5100 

4817 

•4598 

•4283 

•390 4 

! 3425 

•2749 


25 

•7225 

•0097 

•5478 

5074 

•1783 

•4502 

•4241 

■3802 

i -3370 

•2085 


20 

7205 

•0073 

5451 

5045 

•4752 

4529 

•4209 

•3823 

•3330 

•2025 


27 

7187 

•0051 

•5427 i 

•5017 

4723 

•4199 i 

•4170 

3780 

3287 

•2509 


28 

7171 

•0030 

•5403 

•4992 

4090 ! 

•4471 

•4110 

, -3752 

•3218 

j 2510 


29 

7155 

0011 

*5382 

•4909 j 

4071 

■4 1 44 

•4117 

1 -3720 

| 3211 

•2400 


30 

•7141 

•5994 

•5302 

•4947 

•4018 

•4120 

4090 

[ 3091 | 

*3170 

•2419 


CO 

•0933 

•5738 

•5073 

•4632 

•4311 

4004 

•3702 

3255 ! 

•2051 

•1044 


QO 

•0729 

*5480 ; 

4787 

__ 

•4319 j 

3974 

•3700 

3309 

•2804 

•2085 

0 







Vdines of 


APPENDIX TABLES 


539 


APPENDIX TABLE 6B. 

(Reproduced by kind permission of Prof. R. A. Fisher and Messrs Oliver & Boyd 
from the former’s “ Statistical Methods for Research Workers.”) 

1 Per Cent. Points of the Distribution of z. 


Values of 



1. 

2. 

3. 

4. 

5. 

0. 

8. 

12. 

24. 

oo. 

1 

4-1535 

4 2585 

4 2974 

4*3175 

4 3297 

4 3379 

1 3182 

4 3585 

4*3089 

4-3794 

2 

2 2950 

2 2970 

2 2984 

2*2988 

2 2991 

2-2992 

2 2991 

2 2997 

2 2999 

2*3001 

3 

1*7040 

1*7140 

1 0915 

1 *0780 

1 0703 

1 0015 

! 0509 

1-0489 

1 0404 

1-0314 

4 

1*5270 

1*4152 

1 1075 

1 3850 

I 3711 

1 3009 

1 3473 

1 3327 

l 3170 

1 3000 

r> 

1*3943 

1*2929 

1 24 19 

1 2101 

1 1971 

1 1838 

1 1050 

11457 

1*1239 

1-0997 

0 

1*3103 

1 1955 

1-1401 

l 1008 

J 0813 

1-0080 

1 0400 

1 0218 

•9948 

•9043 

7 

I 2520 

1 1281 

1 0072 

1*0300 

1 0018 

•9804 

•9014 

9335 

-9020 

•8058 

8 

1 2100 

1 0787 

1*0135 

9734 

9459 

•9259 

S9S3 

8073 

8319 

•7904 

9 

1 1780 

1 0411 

9724 

9299 

•9000 

•8791 

•819 4 

•8157 

7709 

•7305 

10 

I 1535 

1-0114 

9399 

8954 

8010 

•8419 

•8104 

•7744 

•7324 

•6810 

11 

1*1333 

•9871 

*9130 

•8074 

•8354 

•8110 

7785 

7405 

•0958 

•0408 

12 

1 HOC 

*9077 

8919 

84 13 

Bill 

•7804 

•7520 

•7122 

6619 

•6061 

IK 

1*1027 

9511 

8737 

8248 

7907 

•7052 

7295 

0882 

•0386 

•5701 

14 

1*0909 

•9370 

8581 

8082 

*7732 

7171 

•7103 

•0075 

6159 

•5500 

J6 

1*0807 

•9249 

8118 

7939 

7582 

•7314 

•6937 

6496 

•5901 

•5269 

16 

1 0719 

•9144 

8331 

7814 

7450 

7177 

0791 

•0339 

•5780 

•5064 

17 

1 0041 

•9051 

8229 

*7705 

7335 

•7057 

•0003 

0199 ! 

5030 

•4879 

18 

1*0572 

8970 

*8138 

•7007 

7232 

0950 

•0519 

0075 

*5491 

•4712 

19 

I 051 l 

•8897 

8057 

•7521 

7140 

0854 

•0147 ! 

•5904 1 

5306 

•4500 

20 

1 0457 

8831 

7985 , 

| 74 43 

7058 

•0708 ! 

6355 

•5801 

5253 

•4421 

21 

1 0408 

8772 

7920 

J 

7372 ! 

098 1 

0090 

0272 

5773 

• 5150 ; 

•4294 

22 

1 0303 

8719 

•7800 

i 7309 

0910 

0020 

•0190 

5091 

*5050 j 

•4176 

23 

1*0322 

•8070 

•7800 

7251 1 

•0855 

, *(>555 

0127 

| 5015 

*4909 

•1068 

24 

1 0285 

8020 

7757 

! *7197 j 

•0799 

! *0190 

•0001 

! 5545 

4890: 

! 3967 

25 

1 0251 

•8585 

7712 

•7148 1 

•0747 

•0442 

*6000 

•5481 

•4810 

| -3872 

20 

1-0220 

•8548 

•7070 

7103 

0099 

•0392 

•5952 

5122 

•4748 

1 -3784 

27 

1 0191 

•8513 

7031 

7002 

0055 

6310 

5902 

•5307 

•4685 

•3701 

28 

1 0104 

•8481 

•7595 

i 7023 

0014 

•(>303 

•5850 

5310 

•4626 

•3624 

29 

1 0139 

8451 

•7502 

! *0987 

•0570 

•0203 

•58! 3 

5209 

•4570 

•3550 

30 

1*0110 

8423 

•7531 

•0954 

0540 

•0220 

•5773 

•5224 

•4519 

*3481 

CO 

•9784 

•8025 

•7080 

•0472 

I 

* 0028 

•5087 

1 *5189 

4574 

3740 

2352 

00 

•9402 

•7030 ( 

•0051 

•5999 ! 

i 

5522 

5152 

•4004 

•3908 

*2913 

0 

_ _ 

____ 

1 


1 

_ _ 

__ 

_1 

- ~ 

__ _ j 

■ 










Vain*** of 


540 


THEORY OF STATISTICS 


APPENDIX TABLE 6C. 

(Reproduced by kind peinnssion of Pi of. H. A. Fisher, Dr W. K. Deming and Messrs 
Oliver & Boyd from Ptof, Fisher’s tL Statistical Methods Jor liesearch Workers .”) 

0 1 Pen Cent. Points of the Dishubution of z. 


Values of v v 



1. 

o 

3. 

4. 

r: 

6. 

8 

12. 

24. 

to. 

1 

6 4502 

6*5612 

6 5966 

6 6201 

6 6323 

6 6405 

6 6508 

6-661! 

6 6715 

6 6819 

2 

3 4531 

3-4534 

3*4535 

3 4535 

3 1535 

3 4535 

3 4536 

3*4537 

3 4536 

3 4536 

3 

2 5604 

2 5003 

2 1748 

2 1603 

2 1511 

2 4446 

2 4361 

2 4272 

2 1179 

2*1081 

4 

2-1529 

2 0574 

2 0143 

1 9892 

1 9728 

1 9612 

1 9459 

1 9291 

1 9118 

1*8927 

5 

1 9255 

1 8002 

1 7513 

1 7184 

1 6961 

1 osos 

1 6596 

1 6370 

1 6123 

1 5845 

0 

1 7849 

1 6479 

1 582S 

1 5433 

1 5177 

I 1986 

1 4730 

l 4419 

1 4134 

1 3783 

7 

1*6874 

1 *5384 

1 4662 

1 4221 

1 3927 

1 3711 

! 3117 

1 3090 

1 2721 

1*2296 

8 

1 6177 

1 1587 

1*3809 

1 3332 

1 3008 

1 2770 

1 2113 

1 2077 

1 1662 

1*1169 

9 

] 5640 

1 *3982 

1 3160 

1 2653 

1 230 4 

1 20 47 

1 1691 

1 1293 

1 0830 

1 0279 

10 

1 5232 

1 3509 

1 2650 

1 2116 

1 1718 

1*1475 

1*4098 

1*0668 

1 0165 

•9557 

11 

1*4900 

1 3128 

i 

1 1!2I!8 1 1 1(183 

1 1297 

1 J 012 

1 0614 

1*0157 

9619 

•8957 

12 

1 4627 

1 2814 

1 1900 

1 1326 

I 0926 

1 0628 

1 0213 

9733 

9162 

•8450 

13 

1*4400 

1 *2553 

3 1616 

1 1026 1 1*0614 

1*0306 

•9877 

9374 

8774 

8014 

14 

J 1208 

l 2332 

1 1376 

i 0772 

1 0348 1 1 0031 

9586 

9066 

*8439 

•7635 

15 

1 4013 

1 2141 

1 1169 

I (1.75:! 

10119 

9795 

9336 

•8800 

8147 

7301 

If, 

1 3900 

1*1976 

1 0989 

1 0362 

9920 

9588 

9119 

*8567 

7891 

•7005 

17 

1 3775 

1 1832 

1 0832 

1 0195 

9745 

9407 

8927 

•8361 

7664 

•6740 

18 

1 3665 | 

i 1 1701 

1 0693 

1*0047 

9590 

9216 

, 8757 

8178 

7 162 

•6502 | 

19 

1 :u>7 1 1 J. r )!ll 

J 0569 

9915 

9442 

9103 

8605 

•son 

7277 

6285 | 

20 

1 34 SO 

J 1489 , 1 0458 

j 

9798 

9329 

897 1 

1 846!) 

i 7867 

7115 

•6086 | 

21 

1*3401 

1*1398 | J 0358 

9691 

•9217 

8858 

8316 

7735 

6964 

■5904 

*)0 

1 3329 

1 1315 

1 0268 

•9595 

9116 

8753 

! 8234 

7612 

i 6828 

•5738 

2.1 

1 3261 

1*1210 1 1*0186 

9507 

9021 

86. >7 

8132 

7501 

| 6704 j 

5583 

24 

1*3205 

1 1171 

1*0111 

9427 

8930 

856!) 

8038 

7400 

! 6589 

5440 

2:> 

1*3151 

1*1108 

1 0041 

•9354 

8862 

8489 

1 7953 

I 7306 

1 6483 

•5307 

20 

1*3101 

1 1050, 9978 

92S6 

8791 

*8115 

•7873 

1 7220 

•6385 

•5183 

27 

1 3055 

l 0997 

9920 

9223 

8725 

8316 

•7800 

7140 

6294 

•5066 

28 

1 3013 

1*0947 

■9866 

9165 

8664 

8282 

7732 

•7066 

•6209 

•4957 

29 

1 2973 

1 0903 

9815 

9112 

•8607 

8223 

7679 

•6997 

•6129 

4853 

30 

1*2930 

1*0859 

9768 

9061 

*8554 

*8168 

7610 

1 

•6932 

•6056 

•4756 

40 

1 2674 

1 0552 

9435 

8701 

•8174 

7771 

7184 

6463 i 5513 

i *4016 

00 

1*2413 

1 0248 

•9100 

i 

•8345 

•7798 

•7377 

•6760 

i 

•5992 

4955 

•3198 

00 

1*1910 

•»coa 

8453 

1 7648 

i 

•705!) 

•6599 

5917 

•5044 

•3786 

0 


_ 



1 


_ 

- 

_ 

_ 

_ 












Degrees of freedom,. 



0 


35 


40 


45 


50 








ANSWERS 


TO, AND HINTS ON THE SOLUTION OF, THE EXERCISES 
GIVEN IN T1IE VARIOUS CHAPTERS. 


CHAPTER 1. 


N 

20,287 * 

(AH) 

887 

(A) 

2,808 

(AC) 

87 !• 

(H) 

2,858 

(HO 

858 

(n 

719 

(AM) 

1 19 

(AM) 

150 

( aM ’) 

179 

(Ally) 

48 L 

<««, ) 

1,219 

u,>n 

272 

(«!><') 

108 

(A M 

759 

(«!’)') 

20,501* 


LT The frequencies not given in the question itself are: 

(") [All) 107 (AC) J0;> (11C) 525. 

(b) (Afly) 22,980 (ally) 18,585 ( a [lC) 90,178 (a/>» 28,808,495. 


1.4. 

(AH) 

(I!) 

(All) 

(II) 

(All) 

<th 

" (AB)-,(Ap) 

(II) 1 (/*■) 

that is 

(All) 

(») 

( t) 

\ ’ 

. 

(A) 

A T -(A) 

that is 

(All) 

MU) 

(-0 
' («) 




1.5. (All) { (IK') - (11), i.r. the sum of the excesses of (All) and (IK 1 ) over 

( R)/2. 

1.8. 100. Take A —husband exceeding wile in litst measurement, 11 - hus- 
Imnd exceeding wife in second measurement, and find (aj>). 

1.9. 88. If A, II, V denote passing first, second and third examinations, 
(O* (ufiC) and ( Ally) are all that is necessary to answer the question. The other 
live frequencies (including A ? ) are redundant. 

Further, N ~(a(iC) (a[iy) (A) f (71) (ABC) - (Ally), i.e. there is a linear 
relation between the given frequencies and the' ultimate trequeiieics are therefore 
indeterminate. 

1.10. 10 per cent. 


CHAPTER 2. 

2.1. 80/208 or 804 per thousand. 

2.2. 55/85 or 05 per cent. 

2.8. 82 per cent, and 80 per cent. 

2.4. 117. 

2.5. 108. 

2.8. (l -2 q), p { \ (l i 2 q) y i.e. p must lie between 0 and | (1 -2 q) or 

between £ (1 + 2q) and 


541 



542 


THEORY OF STATISTICS. 


2.9. As a hint, remember the condition that— 

(BC)t(B)+(C)~N 

2.10. If A t B, C denote liking chocolates, toffee or boiled sweets, (a/?y) is 
negative. 

CHAPTER 3, 

3.1. Deaf-mutes from childhood per million among males 222; among 
females 183; there is therefore positive association between deaf-mutism and 
male sex: if there had been no association between deaf-mutism and sex, there 
would have been 3179 male and 3393 female deaf-mutes. 

3.2. (a) Positive association, since (AB) 0 1157. 

(b) Negative association, since 291/MM) -=3/5, 380/570=2/3. 

(c) Independence, since 250/708=1/3, 18/111 -1/3- 

3.3. Percentage ol Plants above the Average Height. 

Paientage Crossed. Self-fertilised. 


Ipomnea purpurea 

. 86 per cent. 

25 

Petunia violacea 

79 

17 

Reseda lutea 

78 

.‘H 

Reseda odorata 

71 

45 

Lobelia fulgcns. 

50 

35 


The association is much less for the species at the end than lor those at the 
beginning of the list. 

3.4. Percentage of dark-e>ed amongst the sons ol daik-eved fathers 39 per 
cent. 

Percentage of dark-eyed amongst the sons ol not dark-eyed fathers 10 per 
cent. 

If there had been no heredity, the frequencies to the nearest unit would 
have been (AB) 0 18, (A[l) 0 111, (aB) 0 121, (a[>) a 750. 

3.5. Percentage of light-eyed amongst the wivis of light-eyed husbands 59 
per cent. 

Percentage of light-eyed amongst the wives ol not light-eyed husbands 53 
per cent. 

If there had been no association: (AB) 0 = 298, (^4/?) 0 =225, (aB) 0 143, 

(ap)o “ 108 . 

3.6. The following are the proportions of the insane per thousand in successive 
age-groups:— 

In general population: 0 9, 2 3, 4 1, 5 7, 6 9, 7 5, 7 7, 6 8 
Amongst the blind: 20 1, 16 0, 16 3, 20 7, 18 3, 17 8, 11 4, 5-3 

Note the diminishing association, which is especially clear in the age-group 
65 , and the negative association in the last age-group. The association 
coefficient gives the values below, which deeicasc continuously: 

Association coefficient: f-0 92, + 0 75, 10-61, 10 57, +0 46, +0 41, 

+ 0 20, “0 13. 

3.10. i 0 90. 

3.11. -0 70. 

3.13. The frequencies are, for association: 


0) 

(AB) 

0 


(«B) 

(aft) 

(2) 

(AB) 

(Aft) 


0 

(«P) 

(3) 

(AB) 

0 


0 

(aft) 





ANSWERS. 


543 


and for disassociation: 


(1) 

() 

UP) 


(uB) 

UP) 

(2) 

(AB) 

UP) 


(ali) 

0 

(8) 

0 

UP) 


(<Ai) 

0 


CHAPTER 4, 

4.1. (D)jN - 6 9 per cent. (A)jN = 6*8 per cent. 

(Al)) I (A) 150 (AD) 1(D) 14 6 

umm - :u> ., i^)/(/o -4*7 

(A(lD)l(A(t) - J1 2 (AfjD)l([iD) -54 9 

(HD) 1(1$) 1*2*7 „ (AB)I(H) 20 2 

(ABD)j(A H) 51*0 „ (ABD)I(BD) 05 0 

Tlie above* give tvw> legitimate comparisons. The general results are tlie same 
as for the bo\s, i.e. a \erv small association between development defects and 
dulness amongst those exhibiting nerve signs, as compared with those who do 
not exhibit nerve signs, or with the Is in general. As the association amongst 
those who do not exhibit nerve signs is quite as high as for the girls in general, 
the “conclusion" ({noted does not seem \alid. 

*1.2. (1) (*2) (1) (2) 
Per Per Per Per 

thousand, thousand. thousand, thousand. 

(B)!N a 2 7 5 (A)IN 0 9 4*0 

(AB)I(A) 119 117 ( AB),(B) 4 0 0 8 

(BC)K(') 08 8 08 0 (AC)I(C) 6 6 18*8 

(ABV)j(AC) 216 211 (ABC)I(BC) 66 8 66*8 

The above gi\e tin* two simplest comparisons, either of which is sufficient to 
show that there is a high association between blindness and mental derangement 
amongst the deaf-mutes as well as assoeiation m the general population; amongst 
the old, the association is, in fact, small lor the general population, but well- 
marked for deaf-mutes. Tins lesult stands in direct contrast w r ith that of 
Exercise 4.1, where the association between the two defects A and D was much 
smaller in the defective universe /? than in the uni\crsc at large. As previously 
stated, no great reliance can be placed on tlie* census data as to thtse infirmities. 

4.8. If the cancer death-rates for farmers over 45 and under 15 respectively 
were the same as for the population at large, the rate for all farmers 15 would 
be 1*11. This is sh'it/rihf less than the actual rate 1*20, but the excess would not 
justify the statement that “farmers were peculiarly liable to cancer.” It is, in 
point of fact, due to the further differences of age-distribution that we have 
neglected, r.g. amongst those over 45 there are more over 55 amongst fanners 
than amongst the general population, and so on. 

4.4. 15 per cent. 

4.6. If A and B were independent in both C and y universes, we would have 
(AB) equal to 

471 x419 151 x189 „ 

617 + aw’ = 374 ' 7 

Actually (AB) is only 858. Therefore A and B must be disassociated in one 
partial universe or both. 




544 


THEORY OF STATISTICS. 


4.9. (I) 68*] per cent. (2) 42*5 per cent. The possible fallacy that a total 
association between “spending more than one's opponent” and “winning" only 
meant that Conservatives spent more and that Conservative principles carried 
the day is now avoided, and there seems no reason for declining to consider this 
as evidence of the effect of expenditure on election results. 

4.10. The limits to y arc 

y ](3r - C -1) 

> \{x i a-) 

subject to the conditions y f «r, y <1 0, // < 2,r - 1. No inference of a positive 
association from two negatives is possible unless x lies between the limits 
0*382 . . 0*618 . . . 

4.11. The limits to y are 

(1) y < \{(\x - 6<r a -1) 

-> J(.r+ 6,r 2 ) 

subject to conditions y 4 0, | Ir I, f> ,r. 

An inference isonl^ possible from positive associations of .17/ and 1C li r \> l; 
an inference* is only possible from two negati\e associations ll x be between 
0*211 . . . and 0*274 . . . Note that x cannot exceed J. 

(2) t/- i(6r ‘\r i ~ 1) 

* 1(2r i 3C) 

subject to conditions ?/ ,0, 5 1 1, x. 

No inference is possible fiom positive associations of AB and H(\ 

An inference* is only possibh from negative associations d a lit between 0 183 
. . . and 0*215 . . . Note that x cannot exceed j. 

(3) y < 4{6a - 2«r 2 -1) 

* - 

subject to the conditions y } 0, t 5i - 1, > x. 

As in (2), no inference is possible from positive associations of AC 1 and BC; 
an inference is possible from negative associations if x he between 0 177 . . . 
and 0*224 . . . Note that x cannot exceed 


(TI AFTER 5. 

5.1. A . 0 68: B, 0 36. 

5.2. C -0*02, T 0 01. 

5.4. The table is not isotropic as it stands. It becomes positively so if the 
columns arc arranged in the order A x . A j, A „ 1 4 , A,, and the rows m order (from 
top to bottom) /?,, B^ B x . 

5.5. C --0*05, T - 0*03. 

5.7. C = 0*40. For a large number such as 1000 this is probably significant, 
i.e. not due to fluctuations of sampling. From inspection of the tables the 
contingency is positive, i.e. this evidence would suggest that persons tended on 
the whole to prefer music of their own nationality. But there are exceptions, 
e.g. the English. 

In any ease these data are purely imaginary, and it is not suggested that they 
reflect in any way the true state of affairs. 

5.8. C O-23, T 0*17, suggestive of slight association. 

5.10. C=0*10. 



ANSWERS. 


545 


CHAPTER 6. 

6 . 1 . 1200 , 200 . 

6.2. 270, 40. 

0.3. 95*75. 

6.4. 216*5. 

6.5. (<?) J-shuped; (5) U-shaped; (c) single-humped moderately asymmetrical; 
( d) J-shaped in all three cases. 


CHAPTER 7. 

7.2. 14*58. 

7.3. Mean, 156*73 lb. Median, 154*07 11). Mode (approx.), 150*6 lb. (Note 
that the mean and the median should be taken to a place oi' decimals further than 
is desired lor the mode; the true modes found by lifting a theoretical frequency 
curve, is 151*1 lb.) 

7.1. Mean, 0*6330. Median, 0 6391. Mode (approx.), 0*651. (True mode 
is 0*653.) 

7.5. About £3250. 

7.0. Mean " ^ \ 

7.7. (1) 82 75, (2) 81-78, (8) 80-25, (4) 80-25. 

7.8. Arithmetic mean ^ (2« + J -l) 

ft -t I 
n 

Geometric mean — 2 y . 


Harmonic mean 


«H 1 



7.9. Mean np. If the terms of the given binomial series are multiplied by 
0, 1, 2, . . ., note that the resulting series is also a binomial when a common 
lactor is remo\ed. (A full proof is given in Chapter 10.) 

7.11. (1) 921,507, (2) 916,963. 

7.12. For N.M. specials, 15s. Id. per 120; for ordinaries, 12s. 9d. per 120. 


CHAPTER 8. 

8.2. Standard deviation 21*3 lb. Mean deviation 16*4 lb. Lower quartile 
112 5, upper quartile 168*4; whence Q- 12 95. Ratios: m.d./s.d. -0*77, 
Q/s.d. —0*61. 

8.3. Median =£3250, upper quartile — £5000, 9th decile—£8600 approxi¬ 
mately. 

8.4. Qi— 21*13 years. Median -27*29 years. -32*19 years. Q = 4*03 
years. 

8.5. 2*872. 

8.6. This proposition is equivalent to the one that the square of the mean ot 
a set of positive numbers is less than the mean of the squares. This is proved in 
most text-books on Algebra. 

8.8. (1) M —73*2, a =~17*3; (2) 7R-73-2, rr l7*5; (3) 3/=.73 2, or =-18-0. 
(Note that while the mean is unaffected m the first place of decimals, the 
standard deviation is higher the coarser the grouping.) 

8.9. England, a ?= 2*55; Scotland, cr-2-48; Wales, or =2-33; Ireland, 
o'=2*15 inches. For the weight distribution a =21*11 lb. 

8.10. Vnpq. The proof is given in Chapter 10. 

85 




546 


THEORY OF STATISTICS. 


8.11. Tiie assumption that observations are evenly distributed over the 
intervals does not afiect, the sum of deviations, except for the interval in which 
the mean or median lies; I’ot that interval the sum is n a (0-25 + d a ), lienee the 
entire correction is 

d(n l -w.,) -I n 2 (0*25 -! d 2 ) 

In this expression d is, of course, expressed as a fraction of the class-interval, 
and is given its proper sign. 

8.14. 3-80, 3-65, 8*53, 3-20. 


CHAPTER 9. 

9.1. In class-intervals of 10 lb. 

=4-470, //,-«•927, // 4 89 119; &-0-537, /? a = 4-401. 

Curve lcptokurtio. 

9.2. 0-00, 0-29, 0*27. 

0.8. // 2 -11*375, fi 3 12 705. // 4 -428-708, in class-intervals of 1 gallon. 

//,- 0-110. ,V-3 313. 

Measures of skewness are 0 027, 0*14, 015. Tlu* second is obtained by 
approximating to tin* mode in the manner of 7.26. 

9.4. Eefore corrections. //„- 7*301, /*,=--() 100, //, 103-405; 

After corrections, // a 0-551, p , -0 100, 132 975, 

Note that the small negative // 4 m the liner grouping becomes positive in the 
coarser grouping. 

9.5. //, npqiq --/>). 

// 4 3 p '^ pri 1 | />////( 1 0/m/). 

9.0. About the mean, // ? - 1 t 75. p , 39 75, /< 4 =.- 142 3125. 

About the origin, //./ 21, ///=100, /'/ 1™-’- 

9.8. This pioposition is equivalent to that of Exercise 8.0. For U-shaped 
universes /?„ - 2. 

9.9. A 2 -7-057, A, 30 152, / 4 ^ 259 335. 


CHAPTER 10. 


10.1. 27-31 percent. 

10.2. Expected frequencies are: 1, 12, 00, 220, 495, 792, 92f, 792, 495, 220, 

00 , 12 , 1 . 


Expected mean—0: expected a 1 732. 
Actual mean —0139; actual a .1 712. 

, 0 . 8 . v - rv "f _ 

' 1-712 V /2^ 


Expected frequencies, to nearosl unit, ait-: 2, 11, 51, 178, 438, 705, 951,841, 
529, 236, 75, 17, 3, totalling 1097; (these are obtained by simple interpolation in 
Appendix Table 1). 

10.4. 17. 

10.5. If p is the expectation of getting an even number. 


10 ('s/'V “2 v j,j C 4 />Y 

Hence, p and the number of times is 10,000(-j) 1 " - once. 

10.8. The frequency of r successes js greater than that of r-1 so long as 
r<np -} p ; if np is an integer, r ~ np gives the greatest term and also the mean. 

10.9. This follows at once from a consideration of the Galton-Pearson 

apparatus. 



ANSWERS. 


547 


10 . 10 . 


Binomial. 

Normal Curve. 

1 

3*7 

10 

10*5 

45 

12*7 

120 

1164 

210 

211-5 

252 

2584 

210 

211 5 

etc. 

etc. 


10.11. Mean 74 8, standard deviation 8*28. 

10.12. About zero mean the deciles arc: 0, 0*2588, 0*5214, 0 8110, 1-2816, 
and the corresponding negative values. 

10.1 3 . 

•i-57 V / 2n- 

Calculated mean and quart i!e deviations. 2-05 and 1*78 (observed, 2 02 and 
1*75). These figures are in units of one inch. 

10.14. Calculated mean and quartile deviations (years), 0*87 and 5-88 
(observed, 544 and 4*08). 

10.15. 18. 


10.16. rr 2 207 (uneorreefed). 

Theoretical frequencies, 2, 5, 11, 20, 20. 85, 85, etc. 

10 17. Theoretical frequencies, 880*5, 8074, 281-0, 02 5, 27-8, 6 5, 1*8, 0-2. 
10.18. A 2 — 1 862, A a - 1 7<M, A 4 2-510. 


CHAPTER 11. 

11.1. o L 1414, rr v - 2 280, r I 0 HI. 

A 0 5 V i 0 5, V 1 *8A I 14. 

11.2. r (between A r and V) - -0 66; between V and Z -0*00; between 
Z and A -0 18. 

11.1. r — 4 0 06, 

11.5. (1) -041, (2) 4 0 10. 

CHAPTER 12. 

12.8. From equations (12.11) and (12.12) icplaee a l and o r 2 by Sj and S 2 in 
equation (12.10). Brgarding tins as an equation for r, note that r" is a maximum 
when tan 2 0 is infinite, or 0 15 . 

12.4. In fig. 12.1 suppose even horizontal array to he given a slide to the right 
until its mean lies on the vertical axis through the mean of the whole distribu¬ 
tion; then suppose the* ellipses to he squeezed in the direction of this vertical 
axis until they become circles. The original quadiant has now become a sector 
with an angle between one and two light angles, and the question is solved on 
determining its magnitude. 

12.5. The ellipse is a horizontal section of the surface. Its equation is 

«r a 2r«rw // 8 

’ -f * 1 -r 2 

(7\~ OjO., (1 2 ~ 

and the standard deviations of sections are the square roots of the lengths of 
radii vectors of the ellipse. 

12.6. The maximum and mininmrrt s.d.’s are given by the principal axes, 
which leads to equations (12.11) and (12.12). 

For an intermediate value there are two radii vectors and hence two sections. 



548 


THEORY OF STATISTICS 


12.7. a and b must be negative, and ah - A a '> 0. 

2 - _ 1 h „ | . 1 " 

1 ~ *ab-h'' 2 ~ab~h* 

h 

r ~Vab 

CHAPTER Id. 

HU. ij ru = 0 212, 0.200. 

Id.2. i ?a ,- 0 82, */ w r0-80. 

13.3. p - I 0 79. 

13.4. II the judges be denuled In 1, 2, 3, 

p ia -= - 0 21, p w - -0-00, Pll 1 0-64 

This suggests tJuit judges 1 and 3 have tastes in common, but neither has 
much in common with judge 2. 

13.5. Q^'2 /3. 

13.6. <j = 0 77. 

13.8. r- \ 0*83. 

13.9. r - | 0-22, 11,868 entries. 

CHAPTER 1k 

14.1. r u j = H 0 759, (0 097, ; 2l t -0 400. 

fTj ^, 2*01, cr^. 1{ 0 591, (t <1(J -70 1. 

A x - 9 01 -( a tr-V, ( 0 0030 LY,. 

14.2. « J(21 , 0-80. /f 2(il0 OHi, 0*57. 

11.3. r u , >4 (OOM), ; Jtil t0 803, r u , t 0 397. 

/ 2I14 -0-133, f wl , -0 553, i JI|t -Olltl. 

Oj 917, or 21t34 19 2, <t , l31 125, rr s 12J —105 k 

A, 53 | 0-127A r a I 0 587-Y, I 0 0345A’ 4 . 
lkk h\ (2l) 0*87, li uut) -0-89. 

11.5. (AV 19 9)-4 51 (A,- 49-2) - 0 88( Y, 902) 

- 0 072(A 4 181 1) j 0 03( X & - 41-6). 

r,,j -003. 
r 154 fO-25. 

r 15 M - f 0 23. 

»m..w ■= 0-77- 

11.7. Number of order s n ""‘C, 

Total number n t 2" x - 1} 

r rhis includes ooeflicients of t\pc // J( j and counts J? x[ ) as ddferent from /? 2(1) . 

11.8. Tlie correlation of the //th mdei is i ( 1 > pr). Hence it r lie negative, 
the correlation oi order n -2 cannot be numerically greater than unity and r 
canned exceed (numerically) l/(n -1). 

14.9. r 12 . r _ - J, r lu , ‘r Jul 4 1. 

14.10. r 12t3 — r u , — j 4 — 1, 

CHAPTER 10. 

10.1. Estimated true standard deviation 0 91: standard deviation of fluctua¬ 
tions of sampling 9 38. (The latter, which can be independently calculated, is 
too low, and the lormer consequently probably too high. Cf. 19.30.) 

10.2. 0-43. 

16.3. 58 per cent. 

10.4. a i i jV(ai i + cr 8 a )(or a 8 + <?/) 



ANSWERS 


549 


J.U.O. —rr.-r-— — : rr- 

Va 2 a l a t i» 2 <r a a 
10.6. 0-29. 

The others may be written down from symmetry. 

16.8. (1) No effect at all. (2) If the mean value of the errors in variables is 
d, and in the weights e, the value found for the weighted mean is: 

£ 

The true value + d - r . a x . rr w - _ — ; 

If r is small, d is the important term, and henee errors in the quantities are 
usually of more importance than em>rs in the weights. If r become considerable, 
errors in the weights may be of consequence, but il does not seem probable that 
the second term would become the most important m practical eases. 

16.9. r 1-0 030. 


CHAPTER IT. 

17.1. Jane: V-2*58 + l *13(A - 2) 

Quadratic: Y M8 I MJ1(A’ -2) +0 55(A' -2) a 

Cubic: V 1*18 f 0 023(A' -2) 1 0*55(A -2) 2 +0 325(.Y ~2) 3 

Sums of .squares of lesiduals: 5-819, 1-581, 0*003. 

17.2. If V is the average number ol children for the duration A to X j 1 
years : 

Line: Y- 3-811 ) 0-887^ -3^ 

Quadratic: V’ 4-351 I 0 887j ^ - 3 ; - 0181/^ 3 ) 

\5 i \5 / 

Cubic: V I •3. - il I 0-365( ^ - 3) - 0-1 :14( ' - 3 j - 0-00361 (* - sj 

For X = 17 the three values are 1*17, 1*08, i 09. 

17.3. y M2. 

17.t. X Gross output per £100 labour, V gloss output. 

Y 18 33 I 0 2375A 0*000055 lOA* 2 


CHAPTER 10. 


19.1. Then. M 0, cr^ 1*732: Aetna! Af-0110, a -1 732. 


19.2. (a) 

Then. M 2*5, 

rr 

1 

118: 

Actual M 

2*18, 

(T 

1 

U 

(l>) 

„ M 3, 

a 

1 

•225: 

M 

2 97, 

a 

1 

2« 

(<■) 

„ M - 3 5, 

a 

1 

•323: 

„ M 

3* 17, 

a 

1 

■40. 


19.3. The standard delation of the proportion is 0*00179, and the actual 
divergence is 5* 1 times this, and therefore almost certainly significant. 

19.4. The standard de\iation of the number drawn is 32, and the actual 
difference from expectation 18. There is no significance. 

19.5. Difference from expectation 7*5; standard error 10-0. The difference 
might therefore occur frequently as a fluctuation of sampling. 

19.0. Standard error of proportion of bad eggs 1 *0530 per cent, A range of 
three times this gives range of 7*5 per cent, to 17*5 per cent, approximately. 




550 


THEORY OF STATISTICS. 


19.7. The test can be applied cither by the formulae of Case 2 (19.28) or 
those of Case 3 (19.29). Case 2 is taken as the simplest. 

(AB)j(B) —701 per cent,; (J/>)/(/f) 04*3 per cent. 

Difference 5 8 per cent. (A)jN - 07*0 per cent, and thence e 12 ~3*40 per 
cent. The actual difference is 1-7 times this and might, rather infrequently, 
occur as a fluctuation of sampling. 

19.1). Difference of proportions , 7 () , € 12 —0*033. Difference significant. 

Similar conclusions follow if Ihc formula' of Case 3 (19.29) arc applied. 

19.10. Proportion 3(5 per cent. Limits 32 4 - 39 0 per cent. The sampling 
is almost certainly not simple. Possible causes are: (a) nature of subject-matter 
might require words of certain t\pe, r.g. scientific words probably would not be 
Anglo-Saxon; ( b) the occurrence of one word influences the occurrence of the 
next. 

19.11. If there arc /j samples of /*, individuals each, / 2 of n 2 , etc., 


AV 




l"l 

ll 


19.12. Standard error of expected proportion -23 05 per cent. 

Standard deviation of actual distribution - 23 09 per cent. 

19.13. Standard deviation of simple sampling 23 0 per cent. The actual 
standard deviation does not, therefore, seem to indicate any real variation, but 
only fluctuations of sampling. 

19.1 t. .v 0 -7 02, and nr, 2 5 units. 

19.15. a 2 npq as if tin chance of success were /> in all cases (but the mean 
is n /2, not />/?). 

19.10. Mean number of deaths per annum - rr 0 ’ 080, 

rr ’ 500,582 r =-0000029. 


CHAPTER 20. 

20.1. P -O'1773. 

20.2. V _ 0 9595. 

20.3. Median: Estimated frequency - 1551. Standard error 0 28 lb. 

Lower Q: frequency 1 172. Standard error 0-20 lb. 

1’pper fiequencv 1110. Standard error 0 31 lb. 

20.1. 018 lb. 

20.5. 0*2t lb., 11 per cent, less than the s.e. of the median. 

20.0. Estimated frequencies: - 07.518, Mi 03,152, Q, 30,188. 

Standard errors (years) 0*011, 0*013, 0 023. 

20.7. Standard error of mean 0 015 \cars. 

20.8. Standard error of quart iles 0 020 v ears. 

20.9. a 1*31270. 

\ n 

20.10. c 12 —130 shillings. Difference of means 2 shillings. Difference 
hardly suggestive of real effect. 

20.12. Yes, one might, because' the results on farms in successive years are 
correlated. 

20.13. Mean = 5-013 ; s.e. of mean 0 10. 

Median =8*128; s.e. of median 0*21. 

20.14. P =-0-300. 

20.15. £450,000; £1,350,000. 

20.10. 0*72 inch. 



ANSWERS. 


551 


CHAPTER 21. 

21.1. Standard error -0-223 lb. 

On basis of normal distribution —0170 lb. 

21.2. 0*011, 0 014. 

21.8. S.e. of s.d. =0-707 *- 
Vn 

S.e. of Q = 0 787 - f 
V n 

21.4. Difference of s.d.’s 0*2. On the assumption of normality c i2 - 0*088. 
Difference miglrt therefore arise, rather infrequently, as sampling fluctuation. 

21.5. r - - 0 008 for height distribution, r= + 0-71 for marriage distribution. 


/'» - fh 


n 


2 nr 4 

lor normal curve. 
n 


a 



!U ~//, 2 -0//,/^+9/V 

71 


0(7 ( ' 


n 


for normal curve. 


2 1 

a A4 w !8C/i 2 a (//| ”/E 2 ) 4 (//„ - /h s -S/E/O 

H l(i/i,/y/ -12//,(/<,. -4///)} 

24nr« 

— lor normal curve. 
n 


21.7. For the (ith and lower momenls. 

21.0. Standard errors are 0 0176, 0 0158, 0 0268, and results might all have 
arisen from an uncorrelated universe; if the universe v\ere actually uncorrelated, 
the standard errors would be the same to the number of places given, owing 
to the smallness of r. 

21.10. Standard errors 0*0758, 0*1808, 0*0850, and the correlations arc all 
significant. 


CHAPTER 22. 

22.1. x 2 5*811, v ~7, V 0*56. 

22.8. f-4 8, v -9, P 0*80. The hypothesis seems reasonable. 

22.5. - 27*94, v 1, P -0 000012. The association is signitiennt. 

22.6. x 4 0*7080, v 1, P =0*400. The divergences from expcctatign may 
well have arisen by sampling fluctuations. 

22.7. Use the result that for large n, x“ is distributed approximately normally. 

22.8. ^ a — 27*68, v — 4, P- 0 00001. The data are very suggestive of 
association. 

22.11. x 2 =18*15, v—2, P =0*0014. This is rather low and we suspect the 
.sampling to be non-random. 

22.12. x* 9*993, v-= 8, P =0*018. Not a very good lit. (In this Exercise 
the last four frequencies have been grouped together and v reduced by unity to 
allow for the estimation of the mean of the Poisson distribution.) 

22.1*1*. x* =0*4700, v —8, P =-0*9*3 (by direct calculation). 



552 


THEORY OF STATISTICS* 


CHAPTER 23. 


23.1. *=-0*664, v =9, P -0-738. 

The probability that we should get a value of t greater in absolute value is 
0*521. 

23.2. The differences in the returns, including cost of manure, have mean — 1, 
c ff a a =1*375, t “1 *907, v - 1, P =0*935. Assuming that distribution of differences 
is normal, a greater value would arise about 05 times in 1000. There is some 
reason for supposing that tin* increased returns on the better manured plot are 
real, and that it would therefore pay to continue the more expensive dressing. 

23.3. Applying the / test for two samples, 

t- 00991, v ~~ 14, P =0*54 

There is nothing in this test to suggest that universes were unlike as regards 
height. 

23.4. 2 = 0*1701, *'i -9, ?' 2 ” 5. The difference of standard deviations is not 
significant. Coupled with Exercise 23.3, we conclude that there is no ground for 
supposing the two universes different as regards height. 

23.5. Applying the t test for two samples, 

fr- 2*083, r=4, P 0*972 

The difference of means is likely to be significant, which supports the 
suggestion. 

23.6. z=41ogd [T - 0-549 a--' - 0-2887 

" * 1-r Vl2 

The observed deviation is suggestive, hut not decisive*. 

23.8. P =0*0018. For the standard error formula P =0*0000078. 

23.9. All significant. 

23.10. All significant. 

23.12. Significantly non-linear. 


CHAPTER 24. 

24.1. 0*93877, 0*93823, 0*93822. 

24.2. 0*823032, 0*818050, 0*817939. The inclusion of the third difference 
affects only the fourth place by a single unit, so we can probably trust the 
answer to four figures. 

24.3. Using logarithmic interpolation, the successive approximations are: 
0*11200, 0*10044, 0*09963. Second difference interpolation using the last three 
data only gives 0*09859. It looks as if we could trust the figure as about 0*100 
or 0*099*. 

24.4. 4195, 4443, 4724, 5030, 5380. 

24.7. 11*388 approximately. 

24.8. Median 4*8924, 4*8809. First decile 1*9474, 1*9572. Ninth decile 
8*4286, 8*3733. As we would probably state such figures only to two decimal 
places, the median would not be appreciably affected by taking second differ¬ 
ences into account, but the deciles would be slightly corrected. 

24.9. Maximum at 1*330, or day 40, 25th July, value 03*7. 

Minimum at 1 *18 4, or day 35*5, 20th 21st January, value 38*0. 

These estimates are very poor. The maximum is actually 03*4 on 15th-17th 
July, and the minimum 37*9 on 8th-12th January. 



INDEX. 


[The references are to pages. The subject-matter of the Exercises given at the ends of Chapters 
has been indexed only when such Exercises (or the Answers thereto) give constants for statistical 
tables m the (ext, or theoretical results of general interest; in all such eases the number of the 
Exercise cited is given In the ease of Authors' names, citations in the text are given first, 
followed by citations of the Authors' papers or books in the list of References. References to 
({reek letters follow the references under Roman letters.] 


Ability, General, refs., 513. 

Absolute measures of dispersion, 140. 

Accidents, Deaths from (Poisson distribu¬ 
tion), 101. 

- , Frequency - distributions, refs., 500, 

508. 

Achemvall, Gottfried, .lbriss der Slaats- 
u'issenschaft , footnote, 5. 

Addil ive property of y\ 120 1-27. 

Advanthaya, N. Iv., refs.. Sampling, 523. 

Ages at death from scarlet fever (Table 
(1.11), 100, (tig. (>.11), 101. 
of coavs correlated with milk-yield; see 
Milk-yield. 

of husband and wife ('fable 11.2), 108; 
constants, 220 221; correlation ratios 
(Ex. 13.2), 250. 

At*(negate of classes, 14. 

Agricultural labourers’ earnings; see 
Earnings; minimum wage-rates, 187; 
calculation of mean and standard 
deviation, 130 188; of median and 
mean deviation, 145-140; of quartiles, 

147. 

Agricultural Market Report, data cited 
from (Table 11.7), 208. 

Airy, Sir G. R., Esc of term “error of 
mean square,” 111. 

Ait ken, A. (\, refs., Applications of gener¬ 
ating functions to normal frequency, 
505; fitting polynomials, 511, 515. 

Allan, F. K., refs., Fitting polynomials, 

515. 

Ammon, ()., Hair- and eye-colour data 
cited from (Table 5.2), 00. 

Analysis of variance, 444 449; use in 
testing significance of correlation ratios, 
453-1*55; of linearity of regression, 
455-456; of multiple correlation co¬ 
efficient, 450 458. 

Analysis Situs, refs., Hotelling, 512. 

Anderson, O., refs., Einfuhrung in die 
mathematische SUitistik , 490; Korrela- 
tiomrechnung , 512; correlation, 512. 

558 


Animal leeding-stuffs, Index numbers of 
prices of, correlated with price-index of 
home-grown oats (Table 11.7), 208; 
215-218. 

Annual value of estates m 1715 (Table 
0.12), 105; (fig. 0.13), 103. 

Approximations in the theory of large 
samples, 879 380. 

Arithmetic mean; see Mean, Arithmetic. 

Array, Del., 190; type of, 190; standard 
deviation of, 200, 214, 242, 200-268; 
homo- and hetero-seedastieity, footnote, 
21 1; in normal correlation, 230, 232, 284. 

Association generally, 34 04; def., 37; 
degrees of, 38; testing by comparison 
of percentages, 39-43; constancy of 
difference from independence values 
for the second-order frequencies, 43; 
coefficients of, 44-15; illusory or mis¬ 
leading, 57 -58; total possible number of 
associations for n attributes, 55-50; 
ease of complete independence, 00 -02; 
use of ordinary correlation* coefficient 
as measure of association, 252 258; 
tetraehoric r as coefficient of association, 
251 252, 253; refs., 499-500, 510. 

Association, Partial generally, 50-04; 
total and partial, def., 50-51; arith¬ 
metical treatment, 52-55; number of 
partial associations for n attributes, 
55 50; testing, in ignorance of third- 
order frequencies, 58-00; refs., 500. 

, Examples: inoculation against cholera, 
40, 42-43; deaths and occupations, 
59-00; deal-mutism and imbecility, 
40-41 ; eye-colour of father and son, 41; 
eye-colour of grandparent, parent and 
offspring, 53 55,00; colour and prickli¬ 
ness of Datura fruits, 44; defects in 
seliool-children, 52 53. 

Asymmetrical frequency-distributions, 94- 
101 ; relative positions of mean, median 
and mode in, 125; diagram, 118; see 
also Frequency-distributions; Skewness. 



554 


THEORY OF STATISTICS 


Attributes—theory of, generally, 11 81; 
def., 11; numerically defined, 77 78; 
notation, 12-14; positive and negative, 
13; order and aggregate of classes. 14; 
ult imate classes, 15-16; positive classes, 
17; consistence of class-frequencies, 
26-81 (see also Consistence); associa¬ 
tion of, 34-40 (see also Association); 
sampling of, 350-372 (see also Sampling 
of attributes). 

Australian marriages, Distribution of, 06; 
(fig. 6.8), 07; calculation of mean and 
standard deviation, 140-141, 142; of 
third and fourth moments, 158 150, 
160; of ft and [l i% 101; median and 
quartiles given, 161: calculation of 
skewness, 161; of kurtosis, 165; stan¬ 
dard error of mean (Ex. 20.7), 302; of 
median and quartiles (E\s. 20.6 and 
20.8), 302; of standard deviation, 101; 
correlation between errors in mean and 
standard deviation (Ex. 21.5), 412. 

Averages generally, 112 114, def. ,112; 
desirable properties of, 113- 11 t; forms 
of, 114; average in sense of arithmetic 
mean, 114; refs., 501-502. See also 
Mean, Median, Mode. 

Axes, Principal, in correlation, 231 232; 
in fitting straight lines to data, footnote, 
814. 

Bachclikr, L., refs., Calcul des prob- 
abiliies , 405; Le jru, la chance et le 
ha sard, 405. 

Baker, O. A., refs., Sampling of variables, 
517, 521; of correlation coellieienl, 521. 

Barlow, 1\, Tables of squares, etc., 71; 
refs , 524. 

Barometer heights (Table 6.10), 00; (fig. 
6.10), 09; means, medians and modes of, 
125; modes of, 488. 

Bartlett, M. S., refs.. Sampling (under 
Wishart), 520. 

Bateman, II., refs., Poisson distribution 
(under Rutherford), 506. 

Baten, W. D., refs., Moments, 504, 500; 
frequency-distri blit i<>ns, 507. 

Bateson, W., Data cited from, 44. 

Bayes, T., refs., Doctrine of clmnccs, 521. 

Becker, R., refs., Anwendum * <ler math. 
Statistic auf Probleme d< r Afussen- 
fabrikalion , 197. 

Beetles ( Chrysomelidic ), Sizes of gcneia 
(Table 0.13), 106. 

Benini, R., refs., Pnncipi di Statistica 
Metodologica , 520. 

Bennett, T. L., refs., Cost of living, 503. 

Berkson, refs., Bayes’ theorem, 521. 

Bernoulli, James, Binomial distribution, 
100; refs., Ars Conjectandi, 505. 

Bertillon, J., refs., Cours ele mental re de 
statistique , 400. 

Bertrand, J. L. F., Quotation on chance. 
330; refs., Calcul des probability, 405. j 


“Best lit,” of regression lines and poly¬ 
nomials, as given bv method of least 
squares, 200-210, 262 264, 811, 313- 
314. 

tfrto-function, 444; tables, refs., 525. 

Bias in sampling, 336, 337-330, 346 317; 
human bias, 337-330. 
in scale reading (Table 6.4), 86. 

Bielfeld, Baron J. F. von, Esc of word 
“statistics,” 4. 

Binomial distribution, 169-180; genesis 
of, in numbers of trials of events, 160 
170; calculated series for certain values 
of p and n (Tables 10.1 and 10.2), 172; 
general form of, 171 173; mean and 
standard deviation of, 173-174; third 
and fourth moments of, 174; /)'- 

eocllicients of, 171 175 (Tables 10.3- 
10.5); mechanical represent at ion of, 
175 170; deduction of normal curve 
from, 177-180; of Poisson distribution 
from, 187-180; m sampling of attri¬ 
butes, 351, see Sampling of attributes; 
refs., 505 508. 

Birge, K.T ,iefs.. Fitting polynomials, 515. 

Birth-rate, Data on (Table 6.1), 83; 
standardisation of, 306 ; refs., 514. 

Bisphain, J. W., refs., Sampling of partial 
correlations, 517. 

Bivariate distributions, 106; normal sur¬ 
face, 227-228. 

Blackman, V. H., quoting data of Ashby 
and Oxley on duckweed (Table 17.3),317. 

Blakeman, J., refs.. Tests for linearity of 
regression, 514, 517; probable error of 
contingency eoetheient, 517. 

Boldiini, M., refs., Variation, 527 ; Stat¬ 
istica, 520. 

Boole, G., refs., Laws of Thought, 400. 

Booth, Charles, on pauperism, 280 200. 

Borel, E., refs., '1'iaite du Patent des 
Probability , 520. 

Bortkiewicz, L. von, Data of deaths from 
kicks by a horse, as Poisson distribution, 
191; refs., Poisson distribution, 506; 
sampling, 517. 

Bowley, A. L., refs.. Cost of living, 503; 
index-numbers, 503; Prices and Wages, 
503; sampling methods, 516; effect 
of errors on an average, 520; test 
of correspondence between statistical 
grouping and formula*, 520; Edge- 
worth’s contributions to mathematical 
statistics, 521. 

Bravais, A., refs., Correlation, 500. 

Breaking-up a group, in interpolation, 
477 k79. 

British Association, Data cited from, 
Stature (Table 6.7), 04; weight (Ex. 
0.6), 110-111; sec Stature; Weight; 
refs., Reports on Index-numbers, 503; 
mathematical tables, 525. 

Biown, J. IV., refs., Index-correlations, 
! 511,513. 



INDEX. 


Brown, W., refs., Effect of experimental 
errors on the correlation coefficient, 513; 
The Essentials of Mental Measurement , 
496. 

Brownlee, J., refs., Frequency - curves 
(epidemiology and random migration), 
508. 

Bruns, H., refs., W ahrscheinlichkciisrech- 
nun^ urul KoUektivmasslehrr , 495. 

Brunt, D., refs.. The Combination of 
Observations, 496. 

Burnside, W., refs., Theory of Probability, 
495. 

0AMinuDGKsniRK, Mortality in, 408. 

Camp, B. 11., refs., Normal hypothesis, 
505; integrals for point binomial and 
hypergeoinetne series, 507 ; correlation, 
511; Tohcbyoheffs inequality, 521; 
sampling, 521. 

Cantelli, F., refs., Interpolation, 526; 
probability, 527; variation, 527. 

C ards, Punched, for recording of data, 
76-77; for sampling, 310. 

Carroll, Lewis (pseudonym), Kx. 1.10 
cited from, 2L 

Carver, II. C., refs., Sampling, 518. 

Castellano, V., rets., Variation and con¬ 
centration, 527. 

Cause and effect, 2 3. 

Cave, Beatrice M., refs., Correlation, 
512. 

Cave-Browne-Cave, F. K., refs., Correla¬ 
tion, 512. 

( ells, in y 2 test, 413-114. 

Census (England and Wales), Tabulation 
of infirmities in older, 22; data as to 
infirmities cited from, 10; classifica¬ 
tion of occupations, as example of a 
heterogeneous classification, 75; data 
us to deficiency in room space, quoted 
from Housing Report, 77; classification 
of ages, 86; data as to number of males 
cited from, 481; refs., 501. 

Chance, in sense of complex causation, 38, 
of success or failure of an event, 169 
170, 350; in definition of “random¬ 
ness,” 336. 

Chances, Small, 191; see Poisson distribu¬ 
tion. 

Cbarlier, C. V. L., Cheek, in calculation of 
moments. 156; alternative approach 
in sampling of attributes, 368-369; 
refs., Theory of frequency curves, 
resolution of a compound normal curve, 
507. 

Chcbyoheff, Chebyslieff, see Tchebycheff. 

Cheshire, L., refs., Sampling of correlation 
coefficient, 522. 

C7i/-squared, see y* % 

Childbirth, Deaths in. Application of 
theory of sampling (Table 19.1), 361. 
368-305. 

Chokhate, J., see Sholiat, J. 


555 

Cholera and inoculations, Illustrations, 
40,42,420, 426-427. 

Chotim&ky, V., refs., Curve fitting, 515. 

ChrysorneiidfP, Distribution of size of 
genus (Table 6.13), 106. 

Chuproff, Cluiprow, see Tschuprow. 

Church, A. E. R., refs.. Sampling from 
U-shaped population (under Ilolzinger), 
518; sampling moments, 522. 

Class, in theory of attributes, 12; class 
symbol, 12; class-frequency, 18; posi¬ 
tive and negative classes, 13-14; order 
of a ( lass, 14; ultimate classes, 15 16. 

Class-interval, Del’., 82-83; choice of 
magnitude and position, 85-80; desir¬ 
ability of equality of intervals, 82, 88- 
89, mllucnce of magnitude on mean, 
118, 139-120; on standard deviation, 
141; on third and fourth moments, 
160. 

(lassifieation - generally, 11 12; by di¬ 
chotomy, def., 32; manifold, 65-81; 
homogeneous and heterogeneous, 74 -75; 
as a series of dichotomies, 75 76, of 
data on punched cards, 76 77; of a 
variable for frequency-distribution or 
correlation table, 82-88, 197-398. 

( loseness of lit, see Fit, y*. 

Cloudiness at Greenwich (Table 6.14), 
106; (fig. 6.15), 101. 

Coefficient of association, 44, 45, 55, 
(standard error) 110; of contingency 
(Pearson’s), 68- 69, (standard error) 410, 
(Tschuprow’s) 70; of variation, 149- 
150, (standard error) 405 406; of rank 
correlation, 246 249, (standard error) 
110; of correlation, partial correlation, 
multiple eoirclation, .see Correlation. 

Colcnrd, C. G., see Reining. 

Colours, Naming a pair, Example of 
contingency (Ex. 22.5), 432. 

Complete function, refs., Tables, 525. 
elliptic integrals, refs., Tables, 525. 

Complex frequency-distributions, 103,105. 

Concentration, icfs., 527. 

Condon, E., refs., Curve fitting, 515. 

Connor, R. L., icfs., Tests of correspond¬ 
ence between statistical grouping and 
formula 1 (under Rowley), 520. 

Consistence of class-frequencies -gener¬ 
ally, 26-81; def., 26; conditions for, 27; 
conditions for, in tlie ease of positive 
class-frequencies, 27 29; refs., 499. 

Consistence of correlation coefficients, 
280 281. 

Constrained data, in Lexis’ sense, 869. 

Constraints, in y 2 distribution, 414-415; 
linear constraints, 115. 

Contingency, Coefficient of (Pearson's), 
68-69; (Tschuprow’s), 70; relationship 
with normal correlation, 289; standard 
error of, 410; refs., 500 501, 517-520. 

Contingency tables, Def., 65 66; treat¬ 
ment of, by elementary methods, 67; 



556 


THEORY OF STATISTICS. 


isotropy, 72-74, 237-239; degrees of 
freedom in, 415-410; testing of diverg¬ 
ence from independence, 418 421. 

Contrary classes and frequencies, for 
attributes, 13; ease of equality of 
contrary frequencies (Exs. 1.6 and 1.7), 
23; (Ex. 2.8), 32; (Exs. 4.7, 1.8 and 
4.9), 04. 

Correction of correlation coefficients for 
errors of observation, 298-299; for 
grouping, 221 222. 

— of death-rates, etc., for age and sex 
distributions, 305 300; refs., 511. 

— of standard deviation, for grouping of 
observations, Ml; of moments, 100, 
399; comparison of corrections with 
sampling effects, 402; iefs., 501 505. 

Correlation generally, 190-308; con¬ 
struction of tables, 190-198; repre¬ 
sentation of bivariate frequency-dis¬ 
tribution by surface and stereogram, 
198-204, by scatter diagram, 205 200; 
treatment of table by coefficient of 
contingency, 200. 

Product-moment correlation co-effici¬ 
ent, 209 213; def., 209; equations and 
lines of regression, 200 211; linear and 
curvilinear regression, 207, 212 243; 
coefficients of regression, 213, standard 
deviations of arrays, 211, 212; calcula¬ 
tion of correlation coefficient, for un¬ 
grouped data, 214 215, 215 218; for 
grouped data, 218 221 ; effect of lluetim- 
tions of sampling on, 22J ; correction for 
grouping, 221; elementary methods for 
cases of curvilinear regression, 242- 
243; rough methods for estimating 
coefficient, 241-212; correlation ratios, 
243 240; effect of errors of observation 
on the coefficient, 298 299. 

Hank correlation coefficient, 246- 
251 ; relationship with product-moment 
coefficient, 219; grade correlation, 249- 
251; tetraelioric r, 251-252; coefficient 
for a fourfold table 1 , direct, 252; intra- 
elass correlation, 253 258; expression 
for coefficient, 256 258; limits to 
negative values of, 256 257; correlation 
between indices, 300 301; correlation 
due to heterogeneity of material, 301; 
effect of adding uneorrelated pairs to 
a given table, 301 302; application to 
theory of weighted mean, 302 305; 
correlation coefficient m t heoiy of 
sampling, 407 108; small samples, 
149 153; refs., 509 51 1, 517 524; for 
Illustrations, Normal, Partial, Ratio, see 
below. 

Correlation, Illustrations and Examples. 
Correlation between; 

Two diameters of a shell ( Pecten ), 
(Table 11.1), 197; constants (Ex. 11.3), 
225. 

Ages of husband and wife ('fable 


11.2), 108; constants, 220-221; corre¬ 
lation ratios (Ex. 13.2), 259. 

Statures of father and son (Table 
11.8), 199; (fig. 11.8), facing 204; (fig. 
11.8), 211; constants (Ex. 11.3), 225; 
correlation ratios, 246; testing nor¬ 
mality of table, 232 -239; diagram of 
diagonal distribution (fig. 12.2), 234, 
of contour lines fitted with ellipses of 
normal suifaee (fig. 12.3), 236. 

Age and yield of milk m cows (Table 

11.4) , 2(H); (fig. 11.9), 212; constant 

(Ex. 11.3), 225; correlation ratios 

(Ex. 18.1), 259. 

Discount rates and percentage of 
reserves on deposit (Table 11.5), 201; 
(fig. 11.2), facing 201. 

Sex-ratio and numbers of births in 
different districts (Table 11.6), 202; 
(fig. 11.10), 213; constants (Ex. 11.3), 
225; correlation ratios, 216. 

Monthly index-numbers of prices of 
animal feeding-,stuffs and home-grown 
oats (Table 11.7), 203; scatter diagram 
(fig. 11.1), 205; constants, 215 218. 

Length of mother- and daughter- 
frond m Lemon mmoi , 218-220. 

Weather and crops, 291 292. 

Movements of infantile and general 
mortality, 292-291. 

Movements of marriage rate and 
foreign trade, 291 296. 

Earnings of agricultural labourers, 
pauperism and out-relief (Ex. 11.2), 
221; partial correlations, 270 272; 
geometrical repiesentation (fig. 14.1), 
276. 

Changes m pauperism, out-relief, pro¬ 
portion of old and population, 288 291 ; 
partial correlations, 272 275. 
Correlation, Normal, 227 240, 282 286; 
deduction of expression for two vari¬ 
ables, 227 229; homoseedasticily and 
linearity of regression, 229 231 ; con¬ 
tour lines, 260 231; normality of linear 
functions of normally distributed vari¬ 
ates, 231; pnneipal axes, 231-232; 
testing of correlation table for stature, 
232 237 ; isotropy of normal correlation 
table, 237 239; iclationship with con¬ 
tingency, 239; outline of theory for 
any number of variables, 282-286; 
coefficient for a normal distribution 
grouped to a fourfold form round the 
medians (Sheppard’s theorem), (Ex. 

12.4) , 240; refs., 509-511 
Correlation, Partial, 261-287; the prob¬ 
lem, partial regressions and correla¬ 
tions, 261-262; notation and definitions, 
263-264; normal equations, funda¬ 
mental theorems on product-sums, 262 - 
263, 265-266; meaning of generalised 
regressions and correlations, 266; re¬ 
duction of standard deviation, 206 -268, 



INDEX 


557 


of regression, 208-209, of correlation, 
209; arithmetical treatment, 209-275; 
representation by a model, 275-277; 
coellieicnt of multiple correlation, 277- 
279; expression of correlations and 
regressions in terms of those of higher 
order, 279 280; consistence of co¬ 
efficients, 280-281; fallacies, 281 282; 
limitations in interpretation of the 
partial correlation eoeflieient, partial 
association ami partial correlation, 282; 
partial correlation in the case of normal 
distribution of frequency, 28 4; refs., 
511 512. 

Correlation ratios, 248 21*0; relation with 
measure of closeness of fit of simple 
cunes, 829; standard error, 409; test 
of significance of, 458- 455; partial, 282; 
rot's., 5X0, 511, 514. 

Cosin, Values of estates m 1715 (Table 
0.12), 105. 

Cost ofliviug, refs., 508. 

— per unit of elect licit v, see Eleetiieily. 

Cotsworth, M. 11., ids., Multiplication 
table, 524. 

Courts, J. K. II., Data quoted from (Table 
17.5), 822. 

Cows, Distribution according to age and j 
milk-yield, see Milk-yield. 

Craig, ( . C., refs., Seminvn mints, 505, 518; 
sampling, 518. 522. 

Cramer, H., refs., ISeries used in mathe¬ 
matical statistics, 507 ; Fandom I an- 
ahles and Probability Disti ibutions , 529. 

Crawford, G. 10., refs.. Proof that ant li¬ 
me! ic inrun exceeds geometric 1 mean.502. 

C relic, A. JL., rels., Mult iplieat ion tables, 525. 

Criminals, Relation between weight and 
mentality ('Fabit' 5.0), 78. 

Crops and weather, Correlation, 291 292. 

Cunningham, E., refs., Ow/rg/z-funetions, 
507. 

Curve lilting. General, 809-881; the 
problem, 805) 811 , method of least 
squares, 811-818; equations for fitting 
polynomials, 812 818; equations for 
stiaight line, 818 814; calculation, 
814 815; reduction of data to linear 
fonn, 810 820; tilting of more general 
polynomials, 820-824; case when inde¬ 
pendent variable proceeds by equal 
steps, 825-827; calculation of sum of 
squares of residuals, 827 828; measure¬ 
ment of closeness of fit, 828-829; 
relationship of measure with correlation 
ratio and multiple con elution eoeflieient, 
829; general remarks, 824, 829; refs., 
514-515. 

Curve fitting, Illustrations and Examples: 

Estimated distance and velocity of 
recession of extra-galactic nebuhr (Table 
17.1), 809-810; (tig. 17.1), 810; straight 
line fitted to, 815-316; measure of lit, 
329* 


Growthof duckw eed (Table 17.8), 817; 
(fig. 17.2), 817; logarithmic curve fitted 
to, 310-818. 

Working costs per unit and units 
sold per head of population m certain 
Electricity Undertakings (Table 17.4), 
320; curve fitted logarithmically, 818- 
821 ; (figs. 17.3 and 17.4), 319 aiid 321. 

'Temperature and loss in weight m soil 
('Table 17.5), 322, parabolas titled to, 
320 824; (lig. 17.5), 324; sum of squares 
of residuals, 327 328; closeness of fit,329. 

(Growth of population in England and 
Wales (Table 17.0), 320; parabola 
lit ted to, 325; (lig. 17.0), 820. 

('unilinear regression, see Regressions. 

Czubcr, E., refs., Wahrschnnhchheitsrech - 
490, 505; Die stalistische Fors- 
ch a ngsmeth o de, 490. 

DAiimsnniK, A. I)., Data cited from, 130, 
(Exs. 19.42 and 19.18), 372, refs., illus¬ 
trations of correlation, 509, 510. 

Darmois, G., refs., Time scries, 512; 
Statistique matin'matiyia, 490. 

Data, Remarks on collection of, 0 7; on 
treatment of, 7; on summarisation of, 
7 -8; on analysis of. 8 9. 

Datura , Association between colour and 
prickliness of fruit, 44, 482 (Ex. 22.0). 

Davenport, 0. R., Data as to Pectcn cited 
from (Table 11.1), 197. 

David, Census of Israelites, foofnolc, 2. 

Da\i<I, E. N , refs., Tubbs oj the Cor rela¬ 
tion Co( fjtci( nl. 529. 

Davis, II. T., refs., Curve fitting, 515; 
(Editoi) Tables of Ihghu Mathematical 
Functions, 525. 

Day, K. E., refs., Statistical Analysis , 490. 

De Einctti, R., refs., Variation, 527. 

De Morgan, refs.. Formal Logie, 490. 

De Yergottini, M., refs., Variation, 527. 

De \Ties, II., Data cited from (Ex. 0.5 

(</)), 110 . 

Deal-mutism, Association with imbecility, 
40 41, 15; frequency among offspring 
of deaf-mutes (Ex. 0.5 (b)), 109. 

Deaths or death-rates, \ssoeiation willi 
occupation (partial correction for age- 
distribution), 59 00, from scarlet fever 
('Tabic 0.11), 100, (fig. 0.11), 101; 

infantile and general, correlation of 
movements, 292-294: standardisation 
of, for age- and sex-distribution, 59 -00, 
305 300, refs., 514; application of 
theory of sampling, deaths from acci¬ 
dent, 359; deaths in childbirth, 303- 
305, ('Table 19.1), 304; deaths from 
explosions m mines, 307-308; inapplic¬ 
ability" of the theory of simple sampling 
to, 357 359; mortality m Cambridge¬ 
shire, 408. 

Deciles, 150-151; standard error of, 
880-882. 




558 


THEORY OF STATISTICS. 


Defects, in school-children, association of. 
Hi, 52 53, refs., 490, 

Degree of a fitted curve, 310. 

Degrees of freedom, in test, 415-410; 
in estimates from small samples, 430- 
437, 

Doming, W. K„ Lola S. Doming and C. L. 
Culcord, Tallies of s-intcgrul, 444, and 
Appendix Table 0C. 

Demoivre, A., Discoverer of normal dis¬ 
tribution, 169. 

Dependent variable, in curve fitting, 
313 314. 

Design of statistical inquiries, in sampling, 
335. 

Dellcfson, J. A., refs., Fluctuations of 
sampling m Mendelian population, 516. 

Deviation, Mean, 134; generally, 144-147; 
def., 344; is least round the median, 
145; calculation of, 147, (E\. 8.11), 153; 
comparison of magnitude with standard 
deviation, 146 147, 182; of normal 
curve, 182. 

--, Quart lie; see Quartdes. 

— , Hoot-mean-square; see Deviation, 
Standard. 

—, Standard, 131; def., 134 135; rela¬ 
tion to root-meau-square deviation 
about any origin, 135-136; is least 
possible root-mcan-square deviation, 
136; little affected by small errors in 
the mean, 136; calculation from un- 
grouped data, 135-138: for grouped 
data, 138 111; inlluence of grouping, 
1 41 ; range of six tunes the s.d. includes 
the bulk of the observations, 1 12; of a 
senes compounded of others, 142 143. 
of N consecutive natuial numbers, 11-3: 
of rectangular distribution, 113; of 
arrays in theory of correlation, 206, 21 4, 
212. of generalised deviations (arrays), 
20 4, 266 267; other names for, 111; 
of a sum or difference, 297 298; effect 
of errors of observation on, 298, ol 
an index, 299 300; of binomial scries, 
174; of Poisson distribution, 189. For 
standard deviations of sampling, ste 
Error, Standard. 

Dice, Records of throwing ('fable 6.15 and 
fig. 6.16). 107, (Ex. 10.2). 193; testing 
for significance of divergence bom 
theory, 351 353, 419 420, 123 421; 

refs., 516-517. 

Dickson. ,1. D. Hamilton, Normal corre¬ 
lation surface, 237; refs., noinuil 
correlation, 509. 

Difference method in correlation, 292 296, 
477; refs., 512 513. 

Differences, m interpolation, 462-464; 
effect of errois in u on, 1-73 177; effect 
of subdividing an interval on, 477. 

Discounts and reserves m American banks 
(Table 11.5), 201; (fig. 11.2 ) Joeing 204. 

Dispersion, Measures of, 112, 134-153; 


absolute measures of, 149; range as a 
measure, 3 34; in Lexis’ sense, normal, 
subnormal and supernormal, 369; refs., 
503-505. See Deviation, Mean; Devia¬ 
tion, Standard: Quartilcs. 

Distance-velocity relation in extra-gal¬ 
actic nebula', 309-310, (Table 17.1), 309, 
(fig. 17.1), 810; straight line fitted to, 
315 316. 

Distribution of frequency; see Frequency- 
distributions; sampling, stec Sampling. 

Dodd, K. L., refs.. Frequency-curves, 507; 
sampling, 518, 522. 

Doodson, A. T., refs., Mode, median and 
mean, 502. 

Duckweed, Correlation between mol her- 
and daughter-frond, 218 220; growth 
of, curve litted to, 316 318. 

Duucker, G„ Relation between geometric 
and arithmetic mean (Ex. 8.12), 153. 

Dunlap, II. F., refs., Sampling from 
rectangular populations, 518. 

Ea li ninc.s of agricultural labourers, Cor¬ 
relation with pauperism and out-relief, 
data (Ex. 11.2), 221; partial correla¬ 
tions, 270 272; diagiam of model (hg. 
14.1), 276. 

Edgeworth, F. Y., Dice-throwing 
(Weldon), 107; refs., geometric mean, 
502; index-numbers, 503; normal law 
and frequency-curves generally, 505, 
506, 507, 508; dissection of normal 
curve, 508; correlation, 509-511; 
theory of sampling, probable errors, 
etc., 516 518; Edgeworth's contribu¬ 
tions to mathematical statistics, see 
Row ley. 

Eflu lent estimates, 428. 

Elder!on, E. M., refs., Variate difference 
correlation method (nndei Pearson), 
518; sampling, (undci Pearson), 523. 

Eldeifon, W. P., Tables of y 2 , 425; refs., 
calculation of moments, 501-; table of 
powers, 525; Frequency Curves and 
emulation, 4{>6, 504, 505, 529. 

Electricity Commission, Data quoted from 
returns lor 1933 34 (Tabic 17.4), 820. 

Electricity, Curve fitled to costs per unit 
and number of units sold per head of 
population for certain Undertakings, 
818 320, (Table 17.4), 320, (tigs. 17.3 
and 17.4), 319, 321. 

Elliptic integrals. Tables of, refs., 525. 

Engineering, Applications of statistical 
method, refs., 497. 

Engledow, F. L., Data cited from, (Table 
23.2), 416. 

Epidemiology, Applications of statistical 
method to. refs., 508. 

Error function, 3 83; see Normal dis¬ 
tribution. 

Error, Law of; errors, curve of, see Normal 
distribution. 



INDEX. 


559 


Error, Mean, X44. 

—, Mean square, 144. 

— of mean square, 144. 

Error, Probable, m theory of sampling, 
353-354. For general references, see 
Error, Standard. 

Error, Standard, def., 353, 380: of number 
or proportion of successes m n events, 
351; when numbers in samples vary 
(Ex. 19.11), 372; when chance of success 
or failure is small, 350; of percentiles, 
median, quartiles, etc., 380 382; of 
scini-interquartilc range, 385-380; of 
arithmetic mean, 380; of variance, 399, 
of standard deviation, 399 402; of 
coefficient of variation, 405 400; of 
moments about fixed point, 395-390; 
of moments about the mean, 897; of 
third and fourth moments about the 
mean, 403-404; of fi x and (l z , 400, 
of coellieients of correlation and regres¬ 
sion, 407 409; approximate formula for 
correlation ratio and caution m ease of 
multiple correlation coefficient, 409; 
of coefficient of association, 410; of 
coefficient of mean square contingency, 
410; absence of, m certain eases for 
rank correlation coefficient, 110; refs., 
510 520. See also Samplmg. Theory of. 

Error, Theory of; see Sampling, Theory of 

bistates, Value of, in 1715; .see Value 

Estimates, Precision of, 335; efficient, 
428; in small samples, 434; of arith¬ 
metic mean, 431- 435. of variant e, 435- 
430; degrees of freedom of, 430 437. 

Estimation, Theory <4, 331 335; of 

theoretical frequencies in the y 2 test, 
427-128; of position of maximum, 
487-488. 

Exclusive and inclusive notations for 
statistics of attnbutes, 22. 

Existent universes, 333. 

Experiments on y 2 test, 429 430. 

Explosions in coal mines, Deaths from, 
as illustrating theory of sampling, 
307-308. 

Eye-colour, Association between father 
and son, 41, 45. 73-74; association 
between grandparent, parent and 
child, 53 55,00, contingency with hair- 
colour, 00-07, 70-71; non-isotropy of 
contingency table for father and son, 
73 74. 

Ezekiel, M., refs., Correlation, 511 ; 
sampling and curvilinear regression, 
522; Methods oj Co) relation Analysis, 
490. 

Falknkr, R. 1\, refs., Translation of 
Mcitzen’s Theorte der Statishk , 498. 

Fallacies in interpreting associations, 
Theorem on, 50-57, illustrations, 57-58, 
owing to changes of classification, actual 
or virtual, 75; in interpreting correla¬ 


tions, 281-282; “spurious” correlation 
between indices, 300 301, correlation 
due to heterogeneity of material, 301. 

Farm Economies Branch, School of Agri¬ 
culture, Cambridge, data cited from 
records of, (Ex. 17.4), 330. 

Fay, E. A., Data citul from Marriages of 
the Deaf in Amerua (Ex. 0.5 (/>)), 109. 

Fechner, G. T.. refs., Frequency-distribu¬ 
tions, averages, measures of dispersion, 
etc., 501, 503; Kolleetivmassletne , 501. 

Fecundity of brood-mares (Table 0.9), 98, 
(fig. 0.9), 98; mean, median and mode 
(Ex. 7.4), 132; inheritance, refs., 513. 

Fegiz, P.E., 1)ata cited from, (Ex. 17.2),830. 

Feldman, 11. M., refs., Sampling, 518. 

Field experiments, refs., 497. 

Fieller, E. 0., refs.. Sampling distribution 
of an index, 518. 

FiIon,E.N. G., refs.,Probable errors,(n nder 
Pearson), 519. 

Finite and infinite universes, 332-333. 

Fisher, A., refs.. Mathematical Theory of 
Probabilities, 490. 

Fisher, Irving, refs., Index-numbers, 503. 

Fisher, 11. A., Criticism of use of standard 
error in test of linearity of regression, 
409; tables of y 2 , 418, 425; normality 
of y 2 foi large 7/, 422; tables of /, 439- 
440; data cited from, 442 143; ap¬ 
plication of /-distribution to regressions, 
443, distribution of eon elation co¬ 
efficient, 449: transformation of, 151; 
refs., goodness of fit of regression 
lines, 510; curve tilting, 515; sampling 
of correlation coefficient, 518, 522; 

moments of sampling distributions, 
518, y l distribution, 520-521 ; tests ot 
agreement between observation and 
hypothesis, 521; sampling theory, 522; 
extremes of sample, 522; statistical 
estimation, 522; /-distribution, 522; 
Statistical Methods for Research If orkers , 
490, 529; Statistical Tables Joi Jiio- 
logical , Anncnltural and Medical Re¬ 
seat eh, 529. 

Fisher’s ^-distribution, 443 441; Tables, 
414, and Appendix Tables 0; use in 
analysis of variance, 448; in testing 
significance of correlation ratios, 453 - 
455; significance of linearity of regres¬ 
sion. 455 450; significance of multiple 
correlation coefficient, 450 458. 

Fit of simple curves to data, see Curve 
fitting; measure of closeness of fit, 
for simple curves, 328 329; “best” fit, 
“closest” fit, as given by method of 
least squares, 209 210, 202 20 4, 311 - 
314; goodness of fit, see y £ distribution. 

Flux, Sir A. W., refs., Measurement of 
price-changes, 503. 

Food, Drink and Tobacco Trades, Data on 
size of lirms m, (Ex. 0.5 ( a )), 109. 

Footrulc, Spearman’s, footnote, 249. 



560 


THEORY OF STATISTICS, 


Foreher, H„ refs,, Die statist!che Methode 
als selbstdndige Wissenschaft , 490. 

Fountain, Sir Henry, refs., Index-numbers 
of prices, 503. 

France, Anatole, Remark about the 
Chinese, 2. 

Freedom, Decrees of; see Degrees of 
freedom. 

Frequency of a class, 13, 83. 

Frequency-curve, Def., 92 93; ideal 
forms of, 93 104; refs., 501, 507 5<)8; 
see Normal distribution. 

Frequency-distributions, 82-83; forma¬ 
tion of, 85-89; graphic representation 
of, 90-92; ideal forms, symmetrical, 
93-94, moderately asymmetrical, 94 98, 
extremely asymmetrical (J-shaped), 98 - 
101, (U-shaped), 101 102; truncated 
distributions, 102 103; complex dis¬ 
tributions, 103-104; pseudo-frequency 
distributions, 104, 108; reduction to 
absolute scale, 150; theoretical, 109; 
binomial distribution, 109, 109-180; 
normal distribution, 180-187; Poisson 
distribution, 187-191; refs., 501, 507- 
508. See also Binomial distribution; 
Normal distribution; Poisson distribu¬ 
tion; Pearson curves; Correlation, 
Normal. 

Frequency - distributions. Illustrations: 
Birth-rates in England and Wales, 83; 
stigmatic rays on poppies, 84; lengths 
of screws, 84; final digits m measure¬ 
ments, 80; persons liable to sur- and 
super-tax in the United Kingdom, 89; 
head-breadths of Cambridge students, 
90; statures of males m the United 
Kingdom, 9 4; Australian marriage-,, 
90; fecundity of brood-mares, 98; 
barometer heights at Greenwich, 99; 
ages at death from scarlet fever, 100; 
annual value of estates m 1715, 105; 
degrees of cloudiness at Greenwich, 
100; sizes of genera in Chrysomelklrr, 
100; dice-throwing, 107; male deaths 
in England and Wales, 107 -108; size 
of firms in Food, Drink and Tobacco 
Trades (Ex. 0.5 («)), 109; percentage 
of deaf-mutes in offspring of deaf-mutes 
(Ex. 0.5 (6)), 109; yield of gram (Ex. 
0.5 (r)), 110; petals in the buttercup, 
Bammculus bulbosus (Ex. 0.5 (rl)), 110; 
weights of males in the United Kingdom 
(Ex. 0.0), 110; wheat shoots (Table 
18.1), 338. See also Correlation, Illus¬ 
trations and examples. 

Frequency-polygon, Construction of, 90. 

Frequency-surface, Forms and examples 
of, 196-^202; (figs. 11.1, 11.2, arid 11.3), 
204, and facing 201; see Correlation, 
Normal. 

Frisch, R., refs., Difference equations and 
frequency - distributions, 507; correla¬ 
tion, 509; time series, 512. 


Fry, T. C,, refs., Probability and its Engin¬ 
eering Uses , 497. 

Fundamental sets. Specifying data, 17. 

Gabaoijo, A., refs., Teoria generate della 
statislica, 498. 

Galton, Sir Francis, Ogive curve, 150 151 ; 
binomial apparatus, 175 176; regres¬ 
sion, 207; Gallon's function (correla¬ 
tion coefficient), 242; normal correla¬ 
tion, 237; data cited from, 41, 53, 73; 
refs., geometric mean, 502; percentiles, 
504; binomial machine, 506; correla¬ 
tion, 509; correlation between indices, 
513; Natural inheritance , 504,506. 

Galvani, L, refs., Means, 527; variation 
and concentration, 527. 

Gamma- functions, refs., Tables, 525. 

Gauss, C. F., Normal distribution, 169; 
use of term “mean error,” 144. 

Geary, R. refs., Frequency-distribu¬ 
tions, 507. 

Geiger, H., refs., Poisson distribution 
(under Butin rford), 506. 

Geometric mean; see Mean, Geometric. 

Gibson, Winifred, refs., Tables for comput¬ 
ing probable errors, 518. 

Gini, t'., refs., Index-numbers, 503; curve 
lilting, 515; general,526; interpolation, 
526; means, 527; probability, 527; 
variability, 527; index-numbers, 528; 
statistical relations, 528; Appnnti di 
Statistica Metodologica , 526; (Ed.) Trat- 
tato Eicinentarc di Statistica , 526. 

Goodness of tit, 430, see yf distribution. 

Grades, 150; grade correlation, 2 49 251; 
relationship with tanks and rank 
correlation, 219 251; see Banks. 

Graduation, 480-485; sec Intel point ion 

Grant, .1. 1\, refs.. Expression of functions 
in series by least squares, 515. 

Graphic method of icpieseuling frequency- 
distribution, 90 92; of interpolating 
for median and percentiles, 121-122, 
150; of icpresentingeorrelat ion between 
two variables, 205 206; of estimating 
correlation coefficient, 241-242; refs. 
(Italian), 526. 

Graunt, John, refs., Observations on the 
Bills of Mortality , (under Hull, C. H.), 
498. 

Gray, John, Data cited from, 361. 

Greatest and least value of sample, refs., 
522 (Dodd), 522 (Fisher and Tippett). 

Greenleaf, II. K. 11., refs., Curve lit ling, 
515. 

Greenwood, M., Data cited from, 40, 42, 
(Table 10.3), 175; use of principal axis 
in curve fitting, footnote, 314; refs., 
inoculation statistics and association, 
499; Poisson distribution, 506; multiple 
happenings, 508; index correlations 
(under Brown), 511, 513; errors of 
sampling, 516. 



INDEX. 


561 


Group, Breaking-up of, in interpolation, 
477-478; formula for halving of, 470- 
480. 

Grouping of observations to form a 
frequency-distribution, Choice of class- 
interval, 82-83; influence of grouping 
on mean, 118, 119-120; influence on 
standard deviation, 141; influence on 
higher moments, 100. 

Growth of duckweed (Table 17.3), 317; 
curve fitted to data, 310-318; (fig. 
17.2), 317; of population (Table 17.0), 
326; curve fitted to data, 325; (fig. 
17.0), 320. 

Haiti-COLOUR’ and eye-colour. Example of 
contingency, 00-67, 70-71; non-iso 

tropy, 71-72; theory of sampling 
applied to certain data, 301-302. 

Ilall, Sir A. D., Data cited from, (Ex. 0.5 

(c)h no. 

Hall, Philip, refs., Partial correlation, 511; 
distribution of means from rectangular 
universe, 522. 

Halving a group, in interpolation, 479- 
tso. 

Harmonic mean; see Mean, Harmonic. 

Harris, .1. A., refs., Short method of cal¬ 
culating cocfln lent of coirelation, 511; 
intraclass coefficients, 514; correlation, 
miscellaneous, 512. 

Hart, U., refs., Effect of errors on correla¬ 
tion, 513. 

Head-breadths of Cambridge students 
(Table 0.0), 90; (tigs. 0.1 and 0.2), 91. 

Height, Distribution of men according to; 
sec Stature. 

— distribution of wheat plants (Table 

18.1), 338. 

Ilelguero, F. de, refs., Dissecting normal 
curves, 508. 

Hendricks, VV. A., refs.. Curve fitting, 515. 

Henry, A., refs., Calculus and Probability , 
495. 

Heron, D., refs., Association (under 
Pearson), 499; relation between fer¬ 
tility and social status, 512; defective 
physique and intelligence, application 
of correction for age-distribution, 514; 
abac for giving probable errors of 
correlation coefficients, 518; probable 
error of partial correlation coefficient, 
518. 

Helerosecdastic arrays, footnote, 214. 

Hilton, John, refs., Sampling inquiry, 516. 

Histogram, Construction of, 90-91. 

History of statistics generally, 4-5; refs., 
498. 

Hojo, T., refs., Sampling distribution of 
medians, quartiles, etc., 518. 

Hollis, T., cited re Cosin’s “ Names of the 
Roman Catholics , etc.,” 105. 

Holzmger, K. S., refs., Sampling from 

U-shaped universe, 518. 


Ilomosccdastic arrays, footnote, 214. 

Hooker, K. H., Correlation between 
weather and crops, 291-292; between 
movements of two variables, 294-290; 
refs., theory of partial correlation, 511; 
correlation between movements of two 
variables, 512; between weather and 
crops, 512; between marriage rate and 
trade, 512. 

Horst, I\, refs., Evaluation of multiple 
regression coefficients, 511. 

Hotelling, II., refs., History, 498; limits 
to skewness, 505; analysis situs, 512; 
time scries (under Working), 513; 
sampling of correlation ratio, 518; 
optimum statistics, 518; generalisation 
of “Student’s” distribution, 522; samp¬ 
ling of rank correlation coefficient, 522. 

Houses, Inhabited and uninhabited, in 
rural and urban districts (Ex. 5.2), 80. 

Hubble, Edwin, Data cited from, (Table 

17.1), 309. 

Hull, C. H., refs., The Economic Writings 
of Sir William Petty , together ivith 
Observations on the Bills of Mortality 
more probably by Captain Graunt , 498. 

Human bias, in sampling, 337-389. 

Humason, M. E., Data cited from, (Table 

17.1), 309. 

Husbands and wives, Correlation between 
ages of (Table 11.2), 198; constants, 
220 221; correlation ratios (Ex. 13.2), 
259. 

Ilypcrgeomcl ric series, refs. (Karl Pear¬ 
son), 500; (Camp) 507. 

Hypothetical universe, 333; sampling 
from, 345-840. 

Illusory associations, 57-58. 

Imbecility, Association with (leaf-mutism, 
40 41, 45. 

Inelusne and exclusive notations for 
statistics of attributes, 22. 

Incomes liable to sur- and super-tax; see 
Sur- and super-tax. 

Incomplete hem-function, tables, refs., 
525; go#M//m-lunction, tables, refs., 525; 
elliptic integrals, tables, refs., 525. 

Independence, Criterion of, for attributes, 
34 35; ease of complete, for attributes, 
60-02; form of contingency or correla¬ 
tion table in ease of, 74; % 2 test for, 
418-430. 

Independent variable in curve fitting, 
313-314. 

Index-numbers of prices, 129-130; use of 
geometric mean for, 129-130; of animal 
feeding-stuffs and home-grown oats 
(Table 11.7), 203; correlation between, 
215 218; refs., 502-503, 528. 

Indices, Correlation between, 300 301; 
refs., 513 514. 

Inlinite and finite universes, 332-333; 
sampling from, 344-345. 


oo 




THEORY OF STATISTICS. 


562 

Inoculation against cholera, Examples, 
40, 42- 43, 420, 420 427. 

Inoculation against tuberculosis iri cattle, 
Example, 425-420. 

In terclass correlation, 254; see Corrclat ion. 

Intermediate observations in a frequency- 
distribution, Classification of, 85, 87-88; 
in correlation table, 197-108. 

Interpolation and graduation -generally, 
402-493; simple interpolation, 402; 
differences, 402 - 404; New ton’s formula, 
401-408; interpolation of statistical 
series, 408-470; practical work, 470- 
473; number of differences to use, 
470 471; choice of set of tC s, 472; 
possible forms of polynomials, 472 473; 
effect of errors on differences, 473-477; 
effect on differences of subdividing an 
interval, 477, breaking-up a group, 
477 479; formula for halving a group, 
479-480; graduation, 480 485; inverse 
interpolation, 485 187; estimation of 
the position of a maximum, 487 488; 
modifying central ordinates to equiva¬ 
lent areas, 489; refs., 524, (Italian) 
520-527. 

Interval, Subdivision of, 477. 

IntracJass correlation, 253 258; coefficient 
of, 255-258; limits to negative values 
of coefficient, 250 257; in analysis of 
variance, 448. 

Inverse interpolation, 485 487. 

Irwin, J. ()., refs., Recent advances, 495; 
sampling distribution of means, 518; 
£ 2 test, 521 ; analysis of variance, 522; 
frequency-distribution of means of 
samples, 522. 

Isotropy, I)cl., 72; generally, 71-74; of 
normal correlation table, 237 239; 
refs., 500. 

Tsscrlis, L., rcls., Partial correlation ratios, 
511; conditions for real significance of 
probable errors, 519; lilting poly¬ 
nomials (Tohcbycheff), 515; probable 
error of mean, 522; small samples 
(under Greenwood), 522. 

Jacob, S. M., refs., Props and rainfall, 
512 513. 

Jeffery, G. B.. refs., Sampling (under 
Pearson), 523. 

Jeffreys, II., refs., Scientific Inference , 495 

Jensen, A., refs., Sampling methods, 519. 

Jevons, W. S., Use of geometric mean, 
130; refs., system of numerically 
definite reasoning (theory of attributes), 
499; Pure Logie and other Minor W or A v, 
499; Investigations in Currency and 
Finance , 502. 

John, V., refs., Der Name Slatistik , 498 
Gcschichfe der Slatistik, 498. 

Jordan, (\, refs,, Time series, 512; curve 
fitting, 515; Slatistique mathematiqui , 
496, 515. 


J-shaped frequency-distributions, 98-101. 

Kai"i kvn, J. (\, refs., Skac Frequency- 
curves ui Iiiology and Statistics, 502, 507. 

Kcllev, T. I„„ refs.. Correlation, 511: 
tables to facilitate the computation of 
correlation coefficients, 525; Statistical 
Method , 496; K. Statistical Tables , 529. 

Kelvin, Lord, Dictum on measurement 
and knowledge, 1, 

Kendall. M. G., ref. to paper on Spear¬ 
man’s correlation coefficient, 410. 

Keynes, J. M., refs., A Treatise on Prob¬ 
ability , 495, 516. 

Khotimsky; see Chotinisky. 

Kick of a horse. Deaths from, following 
Poisson distribution, 191. 

King, George, Graduation of age statistics, 
483 485. 

Kiser, C. V., refs.. Bias in sampling, 516. 

Knibbs. Sir G. II., refs., Price index- 
numbers, 503; frequency-curves, 508. 

Kohlvvcilcr, E., refs.. Statistik itn Dicnste 
der Teehmk , 497. 

Ivolm, S., refs., Theory of Statistical Method, 
496. 

Kondo, T., refs.. Standard error of mean 
square contingency, 519; of standard 
deviation, 519. 

Koren, J., rets.. Hist on/ of Statistics, 498. 

Kurtosis, Def., 165; ealeulation of, 165; 
of binomial series, 17 1: of Poisson 
distribution, 189 190; elfect on stan¬ 
dard error of standard deviation, 400. 

Laboin Gazette, Index-number, refs., 503. 

Labourers, Agricultural, Minimum wage- 
rates of; see Agricultural labourers’ 
earnings; see also Eairungs. 

Laplace, Pierre Sirnon, Marquis de. 
Normal distribution, 169; refs., The one 
analytique des Probability's, 504, 519. 

Lalshaw, V. V., refs.. Curve fitting (under 
Davis), 515. 

Le Roux, J. M., refs.. Sampling, 522. 

Leading term and leading differences, 463. 

Least squares, Method of, in fitting 
regression lines, 209 210, 262 263; in 
fitting curves generally, 309-331; equa¬ 
tions, 312 313. 

Lee, Alice, Data cited from, (Table 6.9), 
98,125, (Table ] 1.3), 199; refs., general¬ 
ised probable error in multiple correla¬ 
tion (under Pearson), 510; inheritance of 
fecundity and fertility (under Pearson), 
513. 

Lenina minor, Correlation between lengths 
of mother- and daughter-frond in, 218- 
221; rate of giowth of, 316-318. 

Leptokurlic curves, 165. 

Lester, A. M., Unpublished data on screw 
measurements, (Table 6.3), 84. 

Levels of significance, m test, 424-425; 
in Most, 440; in z-test, 444, 



TNDEX 


563 


Levy, H., refs Elements of Probability , 495. 

L6vy, 1\, refs., Calcul des Probability, 529. 

Lexis, W., Use of term “precision,” 144; 
alternative approach in sampling of 
attributes, 308-369; refs., Abhand- 
lungen znr Theoric der lievolkerimgs- 
und Moralstatistik , 490, 510; Theoric 
der Massenet schdnungen , 510. 

Linear constraints, 415. 

Linearity of regression, 207; tests for, 
245, 409, 455-450. 

Lipps, G. F., refs., Measures of dependence 
(association, correlation, contingency, 
etc.), 499, 500, Fechncr's Kollectivmass- 
lehre , 501. 

Little, W., Data as to agricultural 
labourers’ earnings cited from (Ex. 
11.2), 224. 

Livi, L., refs., Elemcnti di Statistica , 520. 

Logarithmic increase in population, 127 
129; in duckweed, 1510-218. 

Loss in weight in soils, Percentage; see 
Percentage. 

Lottery sampling, 240-241. 

Macaulay, F. G., refs., Smoothing time 
series, 512 

Maedonell, VY. R„ Data cited from (Table 
0.0), 90. 

Manifold classification; see Classification. 

Match, L., refs , Index-numbers, 502; 
correlation, 512. 

Marriage rate and trade, Correlation of 
movements, 294 290. 

Marriages, Australian; see under 
Aust ralian. 

Marshall, A., refs., Money , Credit and 
Comma cv, 502. 

Martin, E. S., refs., Corrections to 
moments, SOL 

Maximum, Estimation of position of, 
487 188. 

McAlister, Sir Donald, refs., Law of 
geometric mean, 502. 

McKay, A. T., icfs.. Sampling distribu¬ 
tion of correlation coefficient, 519. 

MoNemar, Q.. refs., Partial correlation 
{under Kelley), 511. 

Mean, Arithmetic—generally, 114-120; 
del'., lit; nature of, 114; calculation 
of, for a grouped distribution, 115-118; 
influence of grouping, 118, 119-120, 
position relatively to mode and median. 
125; diagram (fig. 7.2), 118; sum of 
deviations from, is zero, 118; of series 
compounded of others, 119; of sum or 
difference, 119-120; comparison with 
median, 122 124, 287; summary com¬ 
parison with median and mode, mean is 
best for all general purposes, 125 120; 
reciprocal character compared with 
harmonic mean, 120-131; of binomial 
distribution, 173; of Poisson distribu¬ 
tion, 189; weighting of, 802-306; 


standard error of, 380-387, 388-391; 
means of two samples, 387-388, (small 
samples) 442 413; estimates of, 434- 
435; refs., 501-502, 517 520, 521-524, 
(Italian) 527. 

Mean deviation; see Deviation, Mean. 

- error, 141; .sit Error, Standard; 

| Deviation, Standard. 

| Mean, Geometric, 11 f; generally, 120 130; 

' del'., 120; calculation, 120; less than 
arithmetic mean, 120; difference from 
arithmetic mean in terms of dispersion, 
i (Ex. 8.12), 153; of series compounded 
| of others, 127; of series of ratios or 
j products, 127; in estimating mter- 
censal populations, 127 129; conveni¬ 
ence for index-numbers, 129 130; 
weighting of, 300. 

Mean, Harmonic, Ilk generally, 130- 
131; del'., 130, calculation, 130; is 
less than arithmetic and geometiie 
means, 131; difference from arithmetic 
mean in terms of dispersion (Ex. 8.13), 

1 153; reciprocal diameter compared 

i witli arithmetic mean, 130-131: in 

theory of sampling, when numbers m 
j samples \ary (Ex. 19.11), 372. 

Mean sepiaic error, 141. 

, Weighted. 302 300; def. 302; differ¬ 
ence bclwien weighted and unweighted 
means, 303 304; applications of weight¬ 
ing to corrections of death-rates, etc., 
for age- and sex-distribution, 305-300; 
refs., 514. 

Median, 114; generally, 120- 12 k def., 
120; indeterminate in certain eases, 
120; imsuited to discontinuous ob- 
I servations and small series, 120 121; 
j calculation of, 121; graphical deter¬ 

mination of, 121-122; comparison with 
arithmetic mean, 122 12 4, 387; nd- 
j vantages in special cases, 123-124; 

i slight influence of outlying values on, 
12 4; position relative to mean and 
mode, 125, (lig. 7.2), 1)8; weighting of, 
300; standard error of, 380-385; refs., 
517 520. 

Meidcll, II. B., refs.. Sampling, 519, 523. 

Meit/en, P. A., refs., Geschiehte , Theorie 
and Teehnik der Statistik, 498. 

Mendelian breeding experiments as illus¬ 
trations, 44, 130, 353; refs., fluctua¬ 
tions of sampling in, 510-517. 

Mentality, Relationship with weight in a 
selection of criminals (Table 5.0), 78. 

Mercer, \V. B., Data cited from (Ex. 0.5 

Mb no. 

Method of least squares; see Least squares. 

Methods, Statistical, Purport of, 8; def., 3. 

Mice, Numbers m litters. Harmonic mean, 
130; proportions of albinos in litters, 
fluctuations compared with theory of 
sampling (Exs. 19.12 and 19.13), 372. 

Migration, Random, refs., 508. 




504 


THEORY OF STATISTICS 


Milk-yield in cows, Correlation with age 
(Table 11.4), 200; (fig. 11.9), 212; 
constants (Ex. 11.8), 225; correlation 
ratios (Ex. 19.1), 259. 

Milton, John, Use of word “statist," 4. 

Miner, J, R., Tables for calculation of 
correlation coefficients, 525. 

Mises, R. von, refs., Wahrscheinhchkeit , 
Statistik und W ah the it, 495: English 
translation, 529; IVahrschnnluhhat » 
rechmmg, 490. 

Mixed sampling, 880, 847-818. 

Mode—generally, 124 125; def , 121; 
approximate determination from mean 
and median, 125; diagram showing 
position relative to mean and median 
(fig. 7.2), 718; weighting of, 800; refs.. 
502. 

Modifying central ordinates, 489. 

Modulus as measure of dispersion, 141; 
are Precision. 

Mogno, R., refs., Interpolation, 520. 

Mold, R. von, rets,, (leschichte und 
Liicratur dvr Staatswisse 11 schaft , 498. 

Moir, II., refs., Frequency-curves (mor¬ 
tality), 508 

Molina, E. (’., iefs., Hayes' theorem, 528. 

Moments—first, def., 110; second, def., 
185; general, def., 151; expression of 
moments about mean in terms of those 
round an arbitrary point, 155-150; 
calculation of, 150-359; Sheppard’s 
corrections for, 100; of bivariate dis¬ 
tribution, footnote, 211; standard 
errors of, 394-K)4; correlation between 
errors in, 394-404; refs., 505, 517-520. 

Moments, Examples of, Height distribu¬ 
tion, 150-7 58, 100; marriage distribu¬ 
tion, 158-759, 100; weight distribution 
(Ex. 9.1), 167; milk yield distribution 
(Ex. 9.5), 107-168. 

Montessus de Ballore, R. de, refs., Prob¬ 
ability et Statistiques, 490. 

Moore, L. Bramley, Data cited from, 
(Table 0.9), 98; refs., inheritance of 
fertility and fecundity (under Pearson), 
57 3. 

Morant, G., refs., Poisson distribution, 
500. 

Mortality; see Death-rates. 

Mortara, G., refs., Lezioni di Statistica 
Metodologica , 520. 

Movements, Correlation, in two variables, 
Methods, 292 290; refs., 512-518. 

Multiple correlation coefficient, 277-279; 
calculation of, 278; relation with 
measure of closeness of fit for simple 
curves, 329; use of standard error in 
judging significance of, 109; testing 
significance of, 450 458; see Correlation. 

Negative classes and attributes, 18. 

Newbold, Ethel M,, Application of partial 
correlation methods to coefficients not 


determined by product-moment method, 
footnote, 270; refs., frequency-distribu¬ 
tions, accidents, 500. 

Newsholmc, Sir A., refs., Birth-rates, 
correction for age-distribution, 514; 
Vital Statistics , 497. 

Non ton's formula, in interpolation, 404- 
408; binomial coefficients in (Table 
24. f), 170. 

Neyman, J., refs.. Representative method 
in sampling, 510; use and interpreta¬ 
tion of test criteria, 521, 528; y 2 dis¬ 
tribution, 521; small samples, 528. 

Nieetoro, A., refs., Iai Mtihode stalistique , 
490, (II Mctodo Statistico , 520); La 
Mi sura della Vita , 507. 

Nixon, J. W., refs.. Experimental test of 
normal law, 500, 507. 

Normal dispersion, in 1 a*xis’ sense, 809. 

Normal distribution, 109; generally, 
180-187; deduction from binomial 
distribution, 177-180; ordinates, 182- 
188; table of ordinates, Appendix 
Table 1; areas, 7 88 181; table of 
areas, Appendix Tables 2 and 8; 
standard deviation, 182; mean devia¬ 
tion, 182; moments, 7 82; fi t and fi v 
7 82; semiuvariants, 182; lit ted to a 
given distribution (fig. 70 8), 187; 

quartile deviation, 7 81. 185; range i 8 a 
cuts oil all but small traction oi whole, 
185; as an error distribution, 185 -180; 
oceuirenee of, in Nature, 180; place of, 
in theory, 180 187; numerical examples 
of use of tables, 7 88 184; normality 
of sampling distributions, 437-438; 
refs., general, 505 500; dissection of 
compound curve, 508. For normal 
correlation, normal surface, see Correla¬ 
tion, Normal. 

Norton, J. P., 1 )<d,i cited from (Table 11.5), 
201 ; refs.. Statist teal Studies in the 
New York Money Maiket , 512. 

Numerical data. Statistics concerned with, 

2 . 

Nybfdle, II. C., refs., Throne da Statistik, 
497. 

Oats, Home-grown, Index-number of 
prices of. Correlated with price index of 
animal feeding-stuffs (Table 11.7), 203, 
275-218. 

Ogive curve, Gabon’s, 150-157. 

Ohiis, E., refs.. Sampling of correlation 
coefficient (under Cheshire), 522. 

Oppcnheim, A., refs., Cbailier’s form of 
the frequency function (under Aitken), 
515. 

Order of a class, 14; of generalised corre¬ 
lations, regressions, deviations, and 
standard deviations, 201; of multiple 
correlation coefficient, 278. 

Orthogonal polynomials, 324. 

Oseulatory interpolation, 484. 



index- 


565 


Pabst, M., refs., Sampling of rank cor¬ 
relation coefficient (under Hotelling), 
522. 

Paeiello, U., refs., Variation, 527. 

Pairman,E.,rcfs., Corrections to moments, 
504. 

Palgrave, Sir K. H. 1., DicUonaiy of 
Political Economy , 498. 

Parabolas, Fitting of, to data, 809-801; 
def., 810; degree of, 810. 

Parameters, Statistical, def., footnote, 
878. 

Pareto, V., refs., Count (PEconomic poli¬ 
tique, 501. 

Parkes, A. S., refs., Sampling of attri¬ 
butes, 510. 

Partial association; see Association, 
Partial. 

- correlation; see Correlation, Partial. 

Pauperism, Correlation with earnings and 
out-relief (Ex. 11.2), 224, 270-272; 
with out-relief, proportion of aged, etc., 
272-275, 288 291. 

Pearl, It., refs,, Probable errors, 519; 
Introduction to Medical Biometry , 197. 

Pearse, G. E., Data cited from, (Table 
6.14), 100; refs., corrections to moments, 
504. 

Pearson, E. S., refs., The Application of 
Statistical Methods to Industiial Stand¬ 
ardisation, 490; tests for normality, 
519; probable errors, 519; distribution 
of range, 519; polyeboric coefficients, 
500; test, 521 ; use and interpreta¬ 
tion of test criteria, 521, 528: sampling 
distribution of correlation coefficient, 
521, 522, 528; small samples generally, 
528. 

Pearson, Karl- contingency, 08-09; ‘'cor¬ 
rection” to coefficient of contingency, 
footnote, 09; coefficient of \anation, 
149; definition of />\s, footnote, 101; 
skewness, 102; binomial apparatus, 
170; system of curves. 192; ielation- 
ship between normal correlation and 
contingency, 289; sampling methods, 
899; data cited from, 78, (Ex. 5.1), 
79-80, 98, 125, 199; refs., historical 
notes, 498; biography of Gallon, 498; 
obituary of Pearson by Yule, 498; 
correlation of characters not quantita¬ 
tively measurable, 199; contingency, 
etc., 500, 501; mode. 502; standard 
deviation, 504; coefficient of variation, 
504; correction to moments, 504; 
influence of broad categories on corre¬ 
lation, 504; frequency curves and 
correlation, 500- 507; binomial dis¬ 
tribution and machine, 507; hyper- 
geometric series, 507, 517; dissection of 
compound normal curve, 508; general 
methods of curve fitting, 507; correla¬ 
tion and correlation ratio, 509, 510, 511, 
512, 514; fitting of principal axes and 


planes, 510, 515; testing fit of regres¬ 
sion and other curves, 510; inheritance 
of fertility, 518; correlation between 
indices, 514; weighted mean, repro¬ 
ductive selection, 514; curve fitting, 
515; sampling of attributes, 516-517; 
probable errors, 519-520; sampling 
generally, 519, 523; tables of prob¬ 
ability integrals for small samples, 519, 
528; £ 2 distribution, 521; small 

samples, 528; (Editor) Tracts for 
Computers , 525; Tables for Statis¬ 
ticians and Biometricians, 525; Tables 
of B-function, 525; Tables of Gamma- 
Function , 525; Tables of Elliptic Integ¬ 
rals, 525. 

Pearson curves, 192. 

Peas, Applications of theory of sampling 
to experiments in crossing, 358. 

Pevtcn , Correlation between two diameters 
of shell, 197; constants (Ex. 11.8), 225. 

Pepper, .1., refs., Sampling, 519, 520. 

Percentage loss in weight, Relation with 
temperature, for certain soils (Table 
17,5), 322; curve fitted to data, 320- 
328; diagram (fig. 17.5), 324. 

Percentage, Standard error of, 351; when 
numbers in samples vary (Ex. 19.11), 
372; see also Sampling of attributes. 

Percentiles, 150-151; def., 150; advan¬ 
tages and disadvantages, 151; use for 
unmeasured characteristics, 150-151; 
standard eriors of, 380 382; correla¬ 
tion between errors of sampling in, 
385; rets., 501, 517-520. 

Perozzo, L., refs., Applications of theory 
of probability to correlation of ages at 
marriage, 508. 

Persons, W. M., refs., Index-numbers, 503. 

Petals of Banuneutus bulbosus , Frequency 
of (Ex. 0.5 (d)), 110; unsuitability of 
median in ease of such a distribution, 
120 . 

Potent, .1,, refs., Multiplication tables, 524. 

Petty. Sir William, refs, (under Hull), 
Economic Writings, 498. 

Pielra, G refs., Interpolating plane curve, 
515; Sfahslica , 520; interpolation, 

520; variation, 528; statistical rela¬ 
tions, 528. 

Platykurtic curves, 105. 

Plaut, II., rels., Amvendnngen dcr math. 
Statist!k auf Ptobfernc der Massen- 
fabrikation, 197. 

Poincare, 11., refs., Calcul des Probability, 
495, 510. 

Poisson, S. D., 109; refs., Sex-ratio, 517; 
Becherches sur la Probability des Jage¬ 
nt cuts, 500. 

Poisson distribution, 109, 187-191; mean, 
standard deviation, third and fourth 
moments, 189 190; seminvarinnts. 190; 
frequency polygons (fig. 10.4), 190; 
illustrations, 191; ref. to tables of, 190. 



566 


THEORY OF STATISTICS 


Polynomials, Fitting of, to data, 809-381; 
degree of, 310; .shortcomings of, 329; 
orthogonal, 824; differences of, 404; 
possible forms of, in interpolation, 472 - 
478; sec Curve fitting; Interpolation. 

Poppies. Stigmatie rays on. Frequency 
(Table 6.2), 84; unsuitability of median 
in ease of such a distribution, 120. 

Population, Estimation of, between cen¬ 
suses, 127 129; curve fitted to growth 
of, in England and Wales, 325 327; 
refs., 502. 

Positive classes and attributes, Def., 13; 
number of positive classes, 17; suffi¬ 
ciency of, foi tabulation, 17; expression 
of other frequencies m terms of, 20 21. 

Precision, 144; def., 186; of estimates, 
335; varies with square root of number 
of observations, 357. 

Pretorius, S. ,1., Data cited from, (Table 
6.8), 96, (Table 6.10), 99; refs., skew 
frequency surfaces, 511. 

Prices, Index-numbers of, 129 130; use 
of geometric mean in, 129-130; refs., 
502 503. 

Principal axes, in correlation, 231; in 
lilting straight lines, footnote, 314. 

Probability, and statistical inference, 9-10, 
385: use of, in sampling distributions, 
375-376; refs., 57 0, (Italian) 527. 

- integral, 183; see Normal distribution. 

Probable error; sec Error, Standard. 

Pseudo frequency-distributions, 105, 108. 

Punched cards, Recording of information 
on, 76 77. 

Purposive sampling, 336, 316-318. 

Quarttlf deviation; see Quart iles. 

Quurtiies, quart lie deviation and semi- 
interquartile range, 117 148; gener¬ 
ally, 147-149; defs., 147, 148; deter¬ 
mination of, 147-148; ratio of q.d. to 
standard deviation, 118, 149; advan¬ 
tages of q.d. as measure of dispersion, 
149; difference between deviations of 
quail ilcs from median as measure of 
skewness, 102; q.d. of normal curve, 
181 185; standard errors, 380 382, 
885-386; refs., 501, 517-520. 

Quetelct, L. A. J., Lettrcs stir la thcorie dcs 
probabilities (Ex. 19.2), 371. 

Random sampling, 336-345; technique 
of, 339 345; numbers (Tippett’s), 341 - 
844; importance of, 345-346; see 
Sampling; Simple sampling. 

Range, as measure of dispersion, 134. 

Ranks, 150-151; rank correlation, 246- 
249; relationship with grades and 
grade correlation, 249- 251; sampling 
of rank correlation coefficient, 410. 

Ranunculus bulbosus , Frequency of petals 
(Ex. 6.5 (d)), 110; unsuitability of 
median for such distributions, 120. 


Reed, L. F., refs., Curve fitting, 515. 

Registrar-General: Correction or stand¬ 
ardisation of death-rates, 305, refs., 
514; estimates of population, refs., 
502; data cited from Reports of, 40-41, 
59-60, 83, 100, 198, 292-294, 294 295, 
301, (Table 17.6), 326, (Table 19.1), 364, 
361 365, 365-366, 468. 

Regressions generally, 206-211; def., 
curves of, 207, coefficients of, 213; 
total and partial, 262-263; curvilinear, 
207; test of curvilinearity, 245, 409; 
reduction to linear form in certain 
cases, 212-213; standard errors of 
coefficients, 408 409; test of significance 
of, 113; test of linearity of, 455-456; 
refs., 510-511, 514 515. v 

Reserves and discounts in American 
banks, Correlation (Table 11.5), 201, 
(fig. 11.2 )< facing 204. 

Residuals, 311; sum of squares minimised 
by method of least squares, 311 312; 
calculation of sum of squares of, 327- 
828. 

Khiud, A., refs., Tables for computing 
probable errors, 520. 

Rhodes, K. refs., Law of error, 508; 
fitting polynomials, 515; sampling, 517, 
520. 

Rider, P. It., Data cited from, 374; refs., 
recent advances, 49a; small samples, 
523. 

Riel/, II. L., refs., Frequency-distribu¬ 
tions, 508; small samples, 523; Mathe¬ 
matical Statistics , 496; (Ed.) Handbook 
of Mathematical Statistics, 497. 

Ritelue-Seotl, A., refs., Correlation of 
polyclioric table, 500. 

Robinson, G., refs., Calculus of Observa¬ 
tions, 496, 515, 524. 

Robinson, S., refs., Experiments on the 
X 2 test, 521. 

Romanovsky, V., refs., Frequency-curves, 
508; multiple regressions, 511; curve 
fitting, 515; sampling, 523, 524. 

Room spare, Deficiency in, data from 1931 
Census Housing Report (Table 5.5), 77. 

Ross, Sir R., refs.. Frequency-curves 
(Epidemiology), 508. 

Roth, L., refs., Elements of Probability , 
495. 

Royer, E. R., refs., Contingency, 500. 

Russell, W. T., refs., Medical Statistics , 
497. 

Rutherford, Lord, refs., Poisson distribu¬ 
tion, 500. 

Salisbury, F. S., refs., Correlation, 511 
(under Kelley). 

Salvemini, T., refs., Interpolation, 526. 

Salvosa, L. R,, refs., Tables of Pearson’s 
Type III Function, 525. 

Sampling, Theory of - introductory re¬ 
marks, 9-10; preliminary notions, 




INDEX 


567 


generally, 332-348; typos of sampled 
universe, 382-884; estimation from 
samples, 334-335; precision of estim¬ 
ates, 335; types of sampling, 830; 
random sampling, 330-340; bias, 337- 
839; technique of random sampling, 
889-840; lottery sampling, 340-343; 
Tippett’s numbers, 341-344; sampling 
from infinite universes, 344-345; from 
hypothetical universes, 815; import¬ 
ance of random sampling, 345 340; 
purposive sampling, 330, 346-347; 

mixed sampling, 330, 3*14, 347 348; 
stratified sampling, 330, 347-348; 

simple sampling, 350; sampling dis¬ 
tributions, 373-377; refs., 510. 

Sampling of attributes-conditions 
assumed in simple sampling, 350; 
standard deviation of number of pro¬ 
portion of successes in n events, 350 
352; examples from artificial chance, 
852 353; standard error* 353; probable 
error, 353-351*: ease when proportion 
of successes is estimated from tht data 
354-355; examples, 355 350- ease 
when chance of success or failure is 
small, 350; standard crroi independent 
of si/,o of universe, 350 357: precision, 
357; limitations of simple sampling, 
357 358; com pa ring a sample with 
theory, 359 300; comparing one sample 
with another independent thereof, 
300 301; comparing one sample with 
another combined with it, 001 302: 
effect of removing conditions of simple 
sampling, 302 308; application to sex- 
ratio, 303-305; sampling from limited 
material, 307; alternative appioaeb, 
308 309; refs., 510-517. See also 
Binomial distribution; Normal dis¬ 
tribution ; Correlation, Normal. 

Sampling of variables, Barge samples-- 
generally, 873 412; sampling distribu¬ 
tions, 373 375; use of, 375 377: simple 
sampling, 378 379; approximations in 
theory of large samples, 379 380; 
standard error, 380; for standard error 
of particular parameters, see under 
Error, Standard, or under the particular 
parameter; comparison of two samples, 
387-388, 402-403; effect of breakdown 
of simple sampling conditions on 
standard error of mean, 388-391; 
general theorems on standard errors of 
moments, 394 398; effect of Sheppard's 
corrections on standard errors, 399; 
refs., 517-520. 

Sampling of variables, Small samples— 
generally, 434-401; estimates, 434; 
of arithmetic mean, 434-435; of vari¬ 
ance, 435 -430; degrees of freedom of 
estimates, 430 437; tests of signifi¬ 
cance, 437; assumption of normal¬ 
ity, 437-438; /’distribution, 438 442; 


applied to two samples, 442-443; to 
significance of regression coefficients, 
443; 2 -distribution, 443-44*4; analysis 
of variance, 444 -449; significance of 
correlation coefficient, 449 - 453; 
Fisher’s transformation for, 453-458; 
/-test for, 453; significance of correlation 
ratio in uncorrelated universe, 453 -455; 
of measure of linearity of regression, 
455-450; of multiple correlation co¬ 
efficient, 450-458; refs., 521-524. 

Sanders. II. G., refs., Field Experimenta¬ 
tion, 497. 

Saunders. Miss E. TL, Data cited from, 44. 

Savorgnan, F., refs., Variation, 528. 

Seale reading, Bias in, 80-87. 

Scarlet fever, Ages at death from, (Table 
0.11). 100; (fig, 0.11), 101; mean, 117; 
median, 121. 

Scatter diagram, 205 200; generalised, 
275 277. 

Seheibner, W., Difference between arith¬ 
metic and geometric, arithmetic and 
harmonic means (Exs. 8.12 and 8.18), 
153. 

Scottish Milk Records Association, 408. 

Screws, Measurements on (Table 0.3), 84. 

Semi-interquartile range; nee Quartiles. 

Scnum at milts, Del'., 105; calculation of, 
100; of normal distribution, 182; of 
Poisson distribution, 190; standard 
errors (E\. 21.0), 412. 

Sex-ratio of births. Correlation with total 
births (Table 11.0), 202, 212, (fig. 11.10), 
213, 215 240; constants (Ex. 11.3), 
225; applications of theory of sampling 
to, 303 305; refs. ( under Vigor), 517; 
standard error of ratio of male to 
female births (Ex. 19.8), 371. 

Shakespeare, \V., I ’sc of the word 
“statist,” 4. 

Shea, J. 1)., ids , Fitting polynomials 
(under llirge), 515. 

Sheppard, W. F, Correction of standard 
deviation and higher moments lor 
grouping, 100, 399; Iheorcm on cor¬ 
relation of normal distribution grouped 
around medians (Ex. 12.4), 240; refs., 
calculation and correction of moments, 
505; normal curve and correlation, 500; 
theoiy of sampling, 510, 520. 

Shewhart, W. A., refs., Engineering 
Applications of Statistical Method , 497; 
Economic Control of Qualify of Manu¬ 
factured Product , 497; small samples, 
524. 

Sliohat, J. (Chokhate, J.), refs., Sampling, 
524. 

Significimcc,Levelsof; sec Levels of signific¬ 
ance; tests of significance, 335 330,437. 

Simple curve fitting; see Curve lifting. 

Simple interpolation, 402. 

Simple sampling of attributes, 350 353; 
limitations of, 357 359; applications of, 



568 


THEORY OF STATISTICS 


359-362; effect of removing limita¬ 
tions of, 362-868; simple sampling of 
variables, 878-379; effect on standard 
error of mean of removing limitations, 
388-391. 

Sinclair, Sir John, Use ol words “statisti¬ 
cal,” “statistics,” 4-5. 

Sipos, A., refs., Time series, 518. 

Skew or asymmetrical frequency-distribu¬ 
tions, 94-98; see also Frequency-dis¬ 
tributions. 

Skewness, 96, 98; measures of, 162-164; 
standard error of Pearson’s measure of, 
407. 

Small chances, 191; see Poisson distribu¬ 
tion. 

— samples; see Sampling of variables, 
Small samples. 

Smith, B. B., refs., Time correlation, 513. 

Smith, C. 1)., refs., Tehebychcff inequali¬ 
ties, 524. 

Snedecor, G. IV., refs., Calculation and 
Interpretation of Analysts of Variance, 
524. 

Snow, E. C., refs., Estimates of popula¬ 
tion, 512; lines and planes of closest fit, 
515. 

Soil, Relationship between temperature 
and percentage loss in weight; see 
Percentage loss m weight. 

Solomons, JL. M., red's., Unfits to a measure 
of skewness, 505. 

Soper, II. E., refs., Tables of Poisson 
Distribution, 506; Frequency Arrays, 
508; probable error of correlation 
coefficient, 520; of bi-serial expression 
for correlation coefficient, 520; sam¬ 
pling, 520, 524. 

“Sophister” (pseudonym), refs., Small 
samples, 521. 

Southey, Robert, cited re Cosm’s “ Names 
of the Horn an Catholics , etc.," 105. 

Spnhlinger vaccine for tuberculosis in 
cattle, Example, 425-426. 

Spearman, C., “Foot-rule” coefficient of 
rank correlation, footnote, 219; effect 
of errors of observation on the standard 
deviation and correlation coefficient, 
298-299; refs., effect of errors of 
observation, 518; rank method of 
correlation, 510, 518. 

Spurious correlation of indices, 800-301; 
refs., 513 -514. 

Standard deviation; see Deviation, Stand¬ 
ard. 

— error; see Error, Standard; for standard 
error of a particular parameter, see 
under that parameter or under Error, 
Standard. 

Standardisation of death-rates, 305-306; 
refs., 514, 

“Statist,” Occurrence of the word in 
Shakespeare and Milton, 4. 

“Statistic,” Use of singular form, 8-4. 


Statistical, Introduction and development 
in meaning of the word, 4-5; Statistical 
Account of Scotland, 4; Royal Statistical 
Society, 5; scope of statistical methods, 
2-10; design of statistical inquiries, 385. 

Statistical series, Interpolation of, 408- 
470. 

Statistics, Introduction and development 
in meaning of word, 4-6; def., 3; 
theory of, def., 8; sketch of field of, 
6 10; popular attitude towards, 10. 

Stature, Correlation of, for father and son: 
(Table 11.3), 199; diagrams (fig. 11.3), 
facing 204, and (fig. 11.8), 211; con¬ 
stants (Ex. 11.8), 225; correlation 
ratios, 245; testing for normality, 232- 
287; for isotropy, 238-239; diagonal 
distribution (fig. 12.2), 234; contour 
lines (fig. 12.3), 230. 

Stature of males in the United Kingdom: 
(Table 6.7), 91, (fig. 6.6), 95; calcula¬ 
tion of mean, 117, and of median, 121 : 
of means and medians of individual 
countries (Ex. 7.1), 131; of standard 
deviation, 138-139; of percentiles, 151; 
of mean deviation, 116: of s.d., m.d. 
and quartilcs of individual countries 
(Ex. 8.1), 152; of third and fourth 
moments, 156 158, 100; of /?, and (l,, 
161: of skewness, 163-164; distribu¬ 
tion filled to normal curve (fig. 10.3), 
187; standard errors of mean and 
median, 384; of first to ninth deciles, 
385; of standard deviation, 400 401; 
of third and fourth moments, 404; 
correlation between errors in mean and 
s.d., (Ex. 21.5), 112. 

Stead, II. G., refs., Correlation coefficients, 
518. 

Steffensen, J. F., refs., Recent Researches , 
496, 521; interpolation, 521. 

Stevenson, T, II. C., refs., Birth-rates, 
correction of, for age distribut ion ( under 
Nowsholmc), 514. 

Stigmatie rays on poppies, Frequency; 
see Poppies. 

Stirling, James, Expression for factorials 
of large numbers, 178. 

Stoessiger, B., refs., Probability integrals 
for small samples (under Pearson), 519, 
523. 

Straight line fitted to data, 313; reduc¬ 
tion of non-linear data to linear form, 
316-320. 

Stratified sampling, 886, 347-848. 

“Student” (pseudonym), Mnemonic for 
platv- and lepto-kurtosis, 165; stand¬ 
ard deviation of distribution of rank 
correlation coefficient, 410; refs., 
Poisson distribution, 506; elimination 
of spurious correlation due to position 
in time or space, 513; probable errors, 
520; distribution of means of samples 
not drawn at random, 520; probable 



INDEX. 


error of mean (/-distribution), 524; 
small samples, 524. 

“Student’s” /-distribution, 438-443; form 
of, 439; tables of, 439-440, and 
Appendix Table 5; applications of, 
440-442; comparison of two samples, 
442; significance of regression co¬ 
efficients, 443; significance of correla¬ 
tion coefficient, 453. 

Subdivision of intervals, in interpolation, 
477. 

Subnormal dispersion, in Lexis’ sense, 309. 

Sugar beet, Determination of sugar eon- 
tent, as i bust rat ion of sampling tech¬ 
nique, 347 348. 

Supernormal dispersion, in Lexis’ sense, 
309. 

Sur- and super-tax. Data on incomes 
liable to, (Table 0.5), 89, median, 
upper quaitdc and ninth decile (Ex. 
8.3), 153. 

/-distribution ; s cc “Student’s” /-dis- 
tubulion. 

Tables of functions, etc., refs., 521 525: 
see also under subject headings. 

Tabulation of statistics of attributes. It, 
22; of a frequency-dislribution, 88 89; 
of a con elation table', 197 198. 

Tangential interpolation, 48b 

Tappun, M., refs., Partial correlation, 51 L 

Tchebyehetf, rets.. Filling polynomials 
(see Isscrlis), 515; means, 520; in¬ 
equality (undn (’amp), 521, (under 
Smith, C. D.), 524. 

Tehouproff, Tchuprow, etc., see Tschup- 
rnw. 

Tedeschi, T.. refs., interpolation, 527. 

Tempe rat lire and percentage loss in 
weight of certain soils; see Pei cent age 
loss in weight. 

Tests of significance, 335 330; with 
418-421; small samples, 437. See also 
Sampling of vaiiables, Small samples. 

Tctraehoric r, 251 252: differs from pio- 
duct-moment correlation coefficient, 
253; standard error of, 408. 

Thiele, T. N., refs., The Theory of Obsctra¬ 
tions, 505. 

Thomson, G. 1L, refs.. The Essentials of 
Mental Measurement , 49G; computation 
of regression coefficient, (‘to., 511. 

Thorndike, K. L., refs.. Methods of 
measuring correlation, 510. 

Ticket sampling, 340. 

Time-correlation problem, 292-290; refs., 
512 513. 

Tippett, L. H. (\, Sampling numbers, 
341-344; sampling distributions ob¬ 
tained by use of, 374-375; refs,, ex¬ 
tremes of sam]lies (under Fisher), 522; 
The Methods of Statistics, 497, 529. 

Tocher, J. F., Data cited from, (Ex. 9.3), 
107, 108; (Table 11.4), 200; correlation 


569 

of milk-yield and butter fat, ‘408; refs., 
contingency (under Pearson), 500. 

Todhuntcr, 1., refs., History of the Mathe¬ 
matical Theory oj Probability , 498. 

Trachtenberg, M. L, refs., Property of 
the median, 504. 

Transvariazione, refs. (Italian), 527-528. 

Truncated frequency-distributions, 102- 
103. 

TsehebyehelT, P. L.; see Tehcbychcff. 

Tschuprow, A. A., Coefficient of contin¬ 
gency, 70 71; rt'fs., Korrelationslheone , 
490; English 1 1 .mslahon, 529: partial 
(onelaLions, 511 ; mathematical expec¬ 
tations of moments, 520; distribution 
ol means, 524. 

Tula iculosis in cattle. Vaccine for, 
Example, 425 420. 

T> pc of array, Deb, 190. [330. 

Types of universe, 332 331: of sampling, 

IhiiiMAic classes and frequencies. Deb, 
15 10, sufficiency of, for tabulation, 16. 

I ndertakings. Elect l icity: see Electricity. 

Eimerse, D(*b, 25; specification of, 20; 
types of uni\ erse for sampling purposes, 
332 33 4; tmite and infinite universes, 
332 333: universe of universes, 334. 

U shaped frequency,-distributions, 101- 

102 , 101 . 

Vai.i i of estates m 1715 ("fable 0.12), 105, 
(fig. 0.13), 103. 

\ anables, Theory of, Geuciully, 82-308; 
sampling of, generally, 373 401; sec 
Sampling of \anables, 

V ariance, for square' of standard devia¬ 
tion, 135; standard eiror ol, 399; 
estimates of, 43 4 435; analysis of, 
sa Analysis. 

Variate. D< b, footnote, 82, see Variables. 

\ ariate-diffcienrc correlation method, 
292 290. 477; iefs„ 512 513. 

Variation, Coefficient ol, 1 49 150; stand¬ 
ard error of, M >5 100. 

Variation, icfs. (Italian), 527. 

Velocity-distance relation among extra- 
galactic nebula*, (Table 17.1), 309-310; 
straight line fitted to, (fig. 17.1), 310, 
315 31G. 

Venere, A., rets., Means, etc. (under Gini), 
527. 

Venn, John, refs., bogie of Chance , 495, 
510, 517. 

Veronese, C2„ refs.. Interpolation, 527. 

Vorschacffelt, E., refs., Measure of relative 
dispersion, 50 4. 

Vigor, II. 1)., 1 lata cited lroin, (Table 11.0), 
202; refs., sex-ratio, 517. 

Vinci, F., reis., Variation, 528. 

YVaokk, Minimum rates for agricultural 
labourers, see Agricultural labourers; 
of agricultural labourers, correlated 



*5^^ ' ' , f \Athi|3ry of 

otil^re&tef,' jrtm&isml etc?| see 
...'■'Earnings. - />3r % S 

t | 

Wffite*. ltekjjFM-, refs., jf/.v/A; of 
^ SamUstUgmethod. 408. 

'i^rngjjjWF., re 1 ‘n., Defects in school- 
f l chpRen, notation for statistics of attri* 
400. 

j**Water analysis, Methods of, refs., 506. 

Waters, A. C\, refs., Estimating inter- ! 
censal populations, 502. 

Weather and crops, Correlation, 201 202; 
refs., 512. 

Weight of eriminals, Relation with 
^mentality-(T#ble 5JJ), 78. 

- -w malcft ill the United Kingdom 
(Kx. 6.6), 111; mean, median and mode 
(Ex. 7.3), 132; standard deviation, 
mean deviation and quartiles (Ex. 8.2), 
152; moments, fi x , fi, and skewness 
(Exs. 0.1 ami 0.2), 167; standard 
error of mean (Ex. 20.5), 302; of median 
and quartiles (Ex. 20.3), 392; of 

standard deviation (Kx. 21.1), 112. 

Weighted mean; see Mean. Arithmetic; 
also Mean, Geometric; Median; Mode. 

Weldon, IV. F. R., Dice-throwing, (Table 
6.15). 107, 351, 410, 423 42 k 

Westergaard, H., refs./ZVoone det Statistik, 
407; Contributions to the History of 
Statistics, 408. 

Wheat-shoots, Distribution of (TgJjfte 
18.1), 338. rl 

Whipple, G. C., refs.. Vital Statistics, 407. 

Whitaker, Luey, Data cited from, (Kx. 
10.17), 104-105; refs., Poisson distri¬ 
bution, 507. 

Whiting, M. II,, Data cited from, (Table 
5.6). 78. 

Whittaker, E. T., refs., Calculus of Obser¬ 
vations. , 406, 515, 521. 

Wieksell, *S. 1J., refs., Correlation, 513; 
in ease of non-linear regression, 511. 

Wilks, S. 8., refs.. Analysis of variance, 524. 

Will, H. S., refs., ( urve fitting, 513. 

Willeox, W. K„ Citation of Bielfeld, 4. 

Willis, J. C.. Data regarding Chrysomelidcc 
(Table 6.13), 106. 

Wilson, G. S., and others, Use of coefficient 
of variation, 150; refs., The Bacterio¬ 
logical Grading of Milk , 505. 

Winters, F. W., refs., Small samples (under 
Shevvhart), 524. 

Wisliart, John, refs., Field Experimenta¬ 
tion, 407; sampling distributions, 520, 
524. 

Wolfenden, II. II., refs., Mortalities and 
death-rates, 514. 

Woo, T. L., Relationship between later¬ 
ality of hand and laterality of e>e (Ex. 
5,10), 81 ; tables for testing significance 
of correlation ratio and multiple cor¬ 
relation coefficient, 455, and refs., 524. 

Woods, Frances, refs., Index-numbers, 


STATISTICS* 

503; index-correlations (under Brown), 

511,513. 

Woods, Hilda M., refs., Medical Statistics , 
407. 

Working, H., refs.. Time series, 513. 
Working classes, Cost of living, refs., 503. 

Yatfs, F., Data eited from, (Table 18.1), 
338; refs., bias in sampling, 516. 

Yield of grain. Data on, (Ex. 0.5 (r)), 110, 
. (Table 23.2), 446. 

— of milk, Correlated with age in cows; 
see Milk-yield. 

Young, A. A., refs.. Age statistics, 501. 
Yule, G. Udny, Problem of pauperism, 
288-201; use of principal axis in curve 
fitting, footnote, 314; data eited from, 
40, 42, 86, 106, (Table 11.6), 202, ('faille 
11.0), facing 218, 351-352, 446, 456: 
refs., history of words “statistics,” 
“statistical,” 408; obituary of Karl 
Pearson, 198; attributes, association, 
consistence, etc., 400, 500; isotropy, 
influence of bias in statistics of qualities, 
500; determination of mode, 502; 
frequency curves, 506; application of 
Poisson distribution, 506: correlation, 
509, 510, 511.520; pauperism, 512, 513; 
^filth-rates, 513, 514; time correlation 
problem, 513; eorielatum between 
indices, 514; sex-ratio, 517; fluctuation 
of sampling in Mendelian ratios, 517; 
probable errors, 520: % 2 c ‘ ase 

association and contingency tables, 521. 

s-insritxmmoN, see Fisher’s ^-distribution. 
Zimmerman, E. A. W ., Use of words 
“statistics,” “statistical,” in English, 4. 
Zimmerman, II., refs.. Multiplication 
tables, 525. 

Zi/ek, F., refs., Die stahstischen Mittcl- 
werlhe and translation, 502. 

/f-cor.Frieir.Nis, 161; standard errors of, 
406. 

ZMunclion, Tables of, refs., 525; use of, 
in 2 -test, 414. 

7 -coeflieients, 101. 
r-function. Tables of, refs., 525. 

generally, 413 133; analogy with 
Lexis’ Q, 360; def., 416-417; distribu¬ 
tion, 417; tabulation of P for, 418, 425; 
kf. also Appendix Tabic 4 and diagram 
/Al; use as test of significance, when 
j cell frequencies are known a prion , 
,J 418 421; properties of the distribution, 
|P422; normality for large v , 422; eon- 
l ditions on application of test, 422-423; 
effect of taking into account signsof devi¬ 
ations, 423-424; levels of significance, 
421 -425; additive property of, 426-427; 
estimation of theoretical frequencies 
from data,427-429; experiments on, 420 
430; goodness of fit, 480; refs., 520-521. 


PRINTED IN GREAT BRITAIN BY NEILL AND CO., LTD., EDINBURGH, 




tShis foook was taken from the Library on the 
Bate last stamped. A fine of | anna will 
be charged tor each day the book is kept 
over due. 



DELHI POLYTECHNIC 

LIBRARY 


l CLASS NO. 3 M 

BOOK NO. W 4-7 B 
ACCESSION Na %ei6 


MGIFC—S6— X VI -17— II .1*49 2,000 




